0% found this document useful (0 votes)
35 views20 pages

Zhang 2021

This document reviews research on intelligent fault diagnosis of machines with small and imbalanced data. It begins by explaining that intelligent diagnosis models require sufficient data to train, but engineering scenarios provide limited faulty machine data. It then summarizes previous work in three categories: data augmentation strategies that increase training data, feature learning methods that identify faults from small datasets, and classifier designs suited for small, imbalanced data. Finally, it proposes future directions for this research area, including meta-learning and zero-shot learning, to address remaining challenges.

Uploaded by

Tosi Tutoriais
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views20 pages

Zhang 2021

This document reviews research on intelligent fault diagnosis of machines with small and imbalanced data. It begins by explaining that intelligent diagnosis models require sufficient data to train, but engineering scenarios provide limited faulty machine data. It then summarizes previous work in three categories: data augmentation strategies that increase training data, feature learning methods that identify faults from small datasets, and classifier designs suited for small, imbalanced data. Finally, it proposes future directions for this research area, including meta-learning and zero-shot learning, to address remaining challenges.

Uploaded by

Tosi Tutoriais
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ISA Transactions xxx (xxxx) xxx

Contents lists available at ScienceDirect

ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans

Research article

Intelligent fault diagnosis of machines with small & imbalanced data:


A state-of-the-art review and possible extensions
∗ ∗
Tianci Zhang a , Jinglong Chen a , , Fudong Li a , Kaiyu Zhang a , Haixin Lv a , Shuilong He b , ,
Enyong Xu c,d
a
State Key Laboratory for Manufacturing and Systems Engineering, Xi’an Jiaotong University, Xi’an 710049, PR China
b
School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, Guilin 541004, China
c
School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
d
Dongfeng Liuzhou Motor Co., Ltd., Liuzhou 545005, China

article info a b s t r a c t

Article history: The research on intelligent fault diagnosis has yielded remarkable achievements based on artifi-
Received 22 October 2020 cial intelligence-related technologies. In engineering scenarios, machines usually work in a normal
Received in revised form 24 February 2021 condition, which means limited fault data can be collected. Intelligent fault diagnosis with small &
Accepted 24 February 2021
imbalanced data (S&I-IFD), which refers to build intelligent diagnosis models using limited machine
Available online xxxx
faulty samples to achieve accurate fault identification, has been attracting the attention of researchers.
Keywords: Nowadays, the research on S&I-IFD has achieved fruitful results, but a review of the latest achievements
Intelligent fault diagnosis is still lacking, and the future research directions are not clear enough. To address this, we review the
Small & imbalanced data research results on S&I-IFD and provides some future perspectives in this paper. The existing research
Data augmentation results are divided into three categories: the data augmentation-based, the feature learning-based, and
Feature learning
the classifier design-based. Data augmentation-based strategy improves the performance of diagnosis
Classifier design
models by augmenting training data. Feature learning-based strategy identifies faults accurately by
Meta-learning
Zero-shot learning extracting features from small & imbalanced data. Classifier design-based strategy achieves high
diagnosis accuracy by constructing classifiers suitable for small & imbalanced data. Finally, this paper
points out the research challenges faced by S&I-IFD and provides some directions that may bring
breakthroughs, including meta-learning and zero-shot learning.
© 2021 ISA. Published by Elsevier Ltd. All rights reserved.

1. Introduction Generally speaking, intelligent diagnosis models with deep net-


works are built on sufficient machine monitoring data analy-
Fault diagnosis plays an essential link in machine health man- sis [8]. The more sufficient the training data is, the more abundant
agement as it builds a bridge between machine monitoring data the fault types in the training set are, the higher the diagnosis
and its health conditions. Intelligent fault diagnosis utilizes arti- accuracies of intelligent diagnosis models are. However, in engi-
ficial intelligence technologies in the process of fault diagnosis to neering scenarios, it is difficult to build an ideal dataset for the
make it intelligent and automatic [1]. Recently, deep neural net- training of intelligent diagnosis models for the following three
work such as deep auto-encoder (DAE) [2,3], deep convolutional reasons.
neural network (DCNN) [4,5], and other deep networks [6,7], have (1) In engineering scenarios, machines usually work in a nor-
been widely used to build end-to-end intelligent diagnosis mod- mal condition and faults are rare. Therefore, despite the
els, which reduces the dependence on manual labor and expert condition monitoring system composed of multiple sensors
knowledge, and greatly promotes the development of intelligent can collect data from machines constantly, the majority
fault diagnosis [8]. of the collected data is healthy data, and the volume of
Intelligent fault diagnosis with small & imbalanced data (S&I- the fault data is small. Thus, it is hard to obtain sufficient
IFD) refers to build intelligent diagnosis models using a few fault data from engineering scenarios directly to support
machine faulty samples to achieve accurate fault identification. the training of intelligent diagnosis models.
(2) It is expensive to carry out fault simulation experiments to
∗ Corresponding authors. collect machine fault data in the laboratory. For example,
E-mail addresses: jlstrive2008@mail.xjtu.edu.cn (J. Chen), to obtain fault data of gears in the laboratory, researchers
xiaofeilonghe@guet.edu.cn (S. He). need to purchase gear specimens and manufacture faults

https://doi.org/10.1016/j.isatra.2021.02.042
0019-0578/© 2021 ISA. Published by Elsevier Ltd. All rights reserved.

Please cite this article as: T. Zhang, J. Chen, F. Li et al., Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions.
ISA Transactions (2021), https://doi.org/10.1016/j.isatra.2021.02.042.
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

by wire-electrode cutting or other ways artificially. More- venue. Therefore, it is necessary to present a review for S&I-IFD
over, it is necessary to build a fault simulation test bench to summarize the existing achievements and give some future
to collect data. Such an experiment is not only expensive directions for further exploration.
but also consumes a lot of human labor. Besides, some This paper provides a review of S&I-IFD. The contributions of
common faults like gear tooth surface bonding are difficult this paper include two aspects. First, this paper focuses on the
to be simulated by artificial fault manufacturing. Thus, it is small & imbalanced data problem in intelligent machine fault
difficult to collect fault data by conducting fault simulation diagnosis, which is a significant research point, but the related
experiments in the laboratory. review is still lacking in intelligent fault diagnosis. Taking me-
(3) The fault data obtained by computer simulation is not prac- chanical equipment as the research object, this paper reviews the
tical. Some fault simulation software can simulate faults related work on S&I-IFD in the past 10 years, and focuses on the
of equipment and output fault data. For example, Gasturb latest research results represented by GAN and transfer learn-
is a performance calculation software of aero-engines [9]. ing. Different from other reviews on small & imbalanced data
Researchers use Gasturb to simulate faults of aero-engines learning [16–19], this paper divides the achievements of S&I-IFD
to obtain fault data. However, although Gasturb can per- into three categories: data augmentation-based strategy, feature
form precise mathematical operations, it cannot simulate learning-based strategy, and classifier design-based strategy, ac-
the complex working environment of aero-engines. Differ- cording to the general process of machine fault diagnosis (MFD),
ent working environments and working conditions have a as shown in Fig. 1. In particular, MFD contains three main stages:
significant impact on fault data. Therefore, the fault data data preprocessing, feature extraction, and conditions classifi-
obtained by simulation is usually not practical enough. cation [1]. For S&I-IFD, solutions can also be found from the
three steps, as shown in Fig. 1. From the perspective of data
In short, intelligent fault diagnosis in engineering scenarios preprocessing, scholars augment the limited fault data through
is a typical small & imbalanced data problem. In this case, if data generation or data oversampling, and the augmented data
the intelligent diagnosis model is trained with limited fault data can be directly used to train intelligent diagnosis models. In terms
directly, it is prone to poor generalization performance and low of feature extraction, fault features can be learned from limited
fault identification accuracy. Therefore, the lack of fault samples fault data directly by designing regularized neural networks or
makes it difficult to build an effective intelligent diagnosis model feature adaptation without data augmentation. On the aspects of
and achieve accurate fault identification in engineering scenarios. conditions classification, the health conditions of machines can be
How to solve the S&I-IFD problem has been the research classified directly by designing fault classifiers suitable for small
interest of scholars for a long time. For example, some researchers & imbalanced data without data augmentation or the designing of
use Synthetic Minority Over-sampling Technique [10] to expand feature extraction models. Compared with the other reviews on
faulty sample number or develop fault classifiers with Support small & imbalanced data learning [16–19], the presented review
Vector Machines [11], so diagnosis models can have relatively has stronger field characteristics due to the special classification
high identification accuracies under the condition of insufficient mode of the research achievements. As a result, this paper may
fault data samples. Recently, the research on S&I-IFD has yielded be more enlightening for researchers in this field.
fruitful achievements with new machine learning algorithms. For Second, based on the existing research results and the latest
instance, researchers use generative adversarial networks (GAN) machine learning theories, this paper provides some research
to emulate data distributions of machine faulty samples so that challenges and directions for further development. Specifically, in
more faulty samples are generated to expand the limited fault the aspect of data augmentation, the current researches mainly
dataset [12]. Besides, transfer learning-related diagnosis models focused on expanding the number of fault samples, while how
reuse the previously learned diagnosis knowledge to the new to measure and enhance the samples’ quality needs to be paid
diagnosis task, so that accurate fault identification can also be more attention to. How to prevent negative transfer in the diag-
achieved using a few faulty samples [13]. nosis models is a key to the application in engineering scenarios.
At present, there have been many research achievements on Besides, as a new machine learning theory, meta-learning [20]
S&I-IFD, however, the research directions for future development has initially shown its advantages in dealing with small sample
are not clear enough, and a review for existing results is still problems. Thus, the applications of meta-learning theory on S&I-
lacking. Although some reviews about intelligent fault diagnosis IFD may increase greatly. Finally, zero-shot learning [21] may
have been published, these reviews mainly aim at the utilization bring a breakthrough for S&I-IFD in the extreme case where there
of some theory like deep learning to specific objects like induction are no fault samples available at all.
motors [14,15], not on the problem of lacking fault data samples. For the rest of this review, Section 2 describes both the re-
There is no doubt that small & imbalanced data learning is a com- search methodology and the initial data analysis. Section 3, 4, and
mon problem in many areas of the real world, such as medical, 5 review the research achievements from the perspective of the
financial, and so on [16]. For example, the detection of invalid data augmentation, the feature learning, and the classifier design
transactions and financial fraud in the trading system of banks respectively. Section 6 gives some possible extensions for S&I-IFD
is also a typical small & imbalanced data problem. Therefore, in the future. Section 7 presents a conclusion for this review.
many reviews on imbalanced data classification have also been
published [16–19]. However, these existing reviews pay little 2. Research methodology and initial analysis
attention to the new machine learning theory and algorithms
like GAN and transfer learning, which have been widely applied 2.1. Research methodology
to S&I-IFD in recent years. Moreover, these existing reviews are
mainly a summary of research methods and do not take mechan- This paper mainly searched and collected the publications on
ical equipment as a special research object. From the perspective S&I-IFD published from 2010 to November 2020. Four library
of data analysis, the analysis of machine monitoring data often databases covering the natural science research field were se-
involves frequency domain analysis, and so on, which is different lected for the literature search, which are Science Direct, IEEE
from other data analysis such as image data analysis. Besides, Xplore, Springer, and ACM. Besides, the Scopus and the Web of
as far as the authors know, similar review papers for S&I-IFD Science were also used to search the papers in some individual
are neither under consideration nor already published in another publishers [22].
2
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Fig. 1. The process of machine fault diagnosis and the three strategies for S&I-IFD.

Fig. 2. The two-level keywords tree.

Inspired by [16], a two-level keywords tree was constructed to


Fig. 3. The publishing trends of S&I-IFD.
collect published papers on S&I-IFD as comprehensive as possible,
as given in Fig. 2. Since intelligent fault diagnosis in the small &
imbalanced data case was reviewed in this paper, the search key-
word of the first level was restricted to intelligent fault diagnosis. papers about S&I-IFD from 2010 to 2015, while the number of
For S&I-IFD, some scholars regard it as the problem of imbalanced published papers increased rapidly since 2016, which is mainly
data classification [23,24], because the volume of health data is due to the emergence and application of new machine learning
larger than fault data. On the other hand, some scholars regard it models like GAN [12]. The trends in Fig. 3 show that S&I-IFD is
as the problem of small sample classification [12,25,26], that is, a valuable research problem and may continue to be a research
the volume of health data is set to be the same as the fault data hotspot in the next few years.
to avoid the problem of data distribution imbalance. Therefore, After a careful review, the collected papers are classified into
the search keywords of the second level were divided into two data augmentation-based strategy, feature learning-based strat-
parts, i.e., small sample learning and imbalanced data learning egy, and classifier design-based strategy, as shown in Fig. 1.
respectively, as shown in Fig. 2. A total of 249 English journal Inspired by the general process of machine fault diagnosis, the
papers were collected in the initial search. After further review, classification mode of the collected papers in this paper has
145 papers were related to the theme of this paper, which will stronger field characteristics than that in the existing related
be the main data source of this review. Besides, in the citations of reviews [16–19]. Specifically, in the aspect of data augmentation,
these papers, we found 9 related conference papers and included the data generation and data over-sampling models can effec-
them in the references for this review. tively expand the fault dataset [25,27,28], and the data reweight-
In the process of literature search, due to the inaccurate or ing methods based on transfer learning can also augment the
incomplete keywords, there may be a lack of some related lit- limited fault data with the help of other related datasets [13,29].
erature. For example, some scholars regard the imbalanced data The research achievements indicate that augmented data improve
as ‘‘skewed data’’ [16]. However, in the literature search process, the diagnosis accuracies in S&I-IFD effectively. In the aspect of
we did not list ‘‘skewed data’’ as the search keyword, which is the feature learning, fault features can be extracted directly from
main limitation and threat to the validity of the literature search. small & imbalanced data by designing the regularized neural
networks [23,30,31], and feature adaptation based on transfer
2.2. Initial analysis learning is also useful for learning features from limited fault data
to achieve accurate fault identification [32–34]. In the aspect of
Fig. 3 shows the number of S&I-IFD-related publications in classifier design, it is expected to achieve accurate fault iden-
2010–2020. It can be seen that there are few English journal tification by modifying SVM or designing a cost-sensitive fault
3
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Fig. 4. Structure of GAN.

classifier [35–40]. Besides, the classifier design scheme based on Based on the original GAN, scholars have made many im-
parameter-transfer learning also shows effectiveness in the case provements on it and created many variants since its birth. For
of limited fault samples [41–43]. example, Deep Convolutional GAN (DCGAN) [53] uses deep con-
volutional neural networks to build Generator and Discriminator,
3. Data augmentation-based strategy for S&I-IFD which makes it possible to generate high-quality images. Wasser-
stein GAN (WGAN) [54] applies Wasserstein distance to modify
3.1. Motivation the original loss function, which makes the training process more
stable than the original GAN. Wasserstein GAN with Gradient
Data-driven intelligent fault diagnosis has been widely stud- Penalty (WGAN-GP) [55] applies gradient penalty to Discrimi-
ied. Research results have demonstrated that data-driven intelli- nator to further stabilize the training process. Conditional GAN
gent fault diagnosis models can usually achieve good diagnosis (CGAN) [56] introduces the class information of the real data
performance [44]. However, in engineering scenarios, machine into the training of GAN, which enables the model to generate
faulty samples are hard to be collected, which is an important labeled data samples. Auxiliary Classifier GAN (ACGAN) [57] adds
factor restricting the data-driven intelligent diagnosis models a classifier to Discriminator to generate labeled data samples.
to be utilized. As an efficient approach to enhance the neural Semi-supervised GAN (SSGAN) [58] realizes semi-supervised data
networks’ generalization performance, data augmentation [17] classification by constructing pseudo labels for the unlabeled data
presents a good solution for S&I-IFD. With a few faulty data
samples. Information maximizing GAN (infoGAN) [59] can learn
for data generation [45,46], data over-sampling [47,48], or data
the disentangled feature representation by inputting latent code
reweighting [13,29], the limited fault dataset can be augmented
into Generator so that the learned features are interpretable.
to train the intelligent diagnosis models effectively. As a result, in-
For simplicity, we use G and D to represent Generator and
telligent diagnosis models are expected to have strong diagnosis
Discriminator. Q represents classifier. c is the class information of
ability in the case of lacking fault data samples.
the input data. k is the class number. λ and α are real numbers
less than 1. c ′ and c ′′ denote the input latent code and the recon-
3.2. Data generation using generative models
structed latent code. LI (·) represents the calculation of mutual
Recently, data generation models represented by generative information. As shown in Fig. 5, we summarize several common
adversarial networks (GAN) [49] and Variational Auto-Encoder variants of GAN, and their objective functions are given in Table 1.
(VAE) [50] have been deeply studied and shown bright results in 3.2.1.2. Applications of GAN to data generation. The applications
many fields [51]. Fortunately, these generative models can also be of GAN to generate data for S&I-IFD are summarized in Table 2.
used to generate mechanical signals, providing a powerful tool for The research achievements show that the fault data augmented
data augmentation in S&I-IFD [52]. by GAN can improve the faults identification performance of
gears [65], bearings [66], rotors [52], and other components [67]
3.2.1. GAN-Based methods effectively in the case of limited fault data. According to the
3.2.1.1. Introduction to GAN. GAN has two multi-layer neural net- data dimension, these research results can be divided into two
work modules named Generator and Discriminator, as depicted in
categories: one-dimensional samples (1-D) generation and two-
Fig. 4. Generator samples random noise z from distribution pz and
dimensional samples (2-D) generation. Among them, the gener-
then generates data xg , while Discriminator outputs a probability
ation of 1-D data can be classified into three types. The first is
scalar quantity to distinguish real data xr and generated data xg .
to generate raw signals directly [12,27,52,60–64,77]. GAN and
Given G (·) is the operation in Generator, D (·) is the operation in
its variants are applied to generate the monitoring signals of
Discriminator, LG is the objective function of Generator.
machines, and the generated signals can be used to train the in-
LG = Ez ∼pz [log (1 − D (G (z )))] .(1) (1) telligent diagnosis models directly. For example, Zhang et al. [12]
used a deep gradient penalized GAN to generate bearings’ vi-
For Discriminator, LD is the objective function.
bration data, which expands training datasets effectively. The
LD = −Ex∼pr [log D (x)] − Ez ∼pz [log (1 − D (G (z )))] (2) presented work in [12] was one of the earliest research using
GAN for mechanical signals augmentation, which also designed
where pr represents the real data distribution. an index based on correlation coefficients to measure the gen-
As a result, GAN has the overall objective function as follows: erated samples’ quality. The second is to generate the frequency
min max WG,D = Ex∼pr [log D (x)] + Ez ∼pz [log (1 − D (G (z )))] . (3) spectrum of the monitoring signals [65–73]. Compared with raw
G D monitoring data, the frequency spectrum also contains abundant
The specific training process of GAN can be described as fol- fault information and is widely used in machine fault identifica-
lows: tion. For example, Wang et al. [65] adopted GAN to generate the
4
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Fig. 5. Variants of GAN.

gearbox’s signal frequency spectrums. The generated frequency also be used to train the fault classifier directly. For instance,
spectrums with the real ones were used to train a Stacked Auto- Zhou et al. [74] used auto-encoder (AE) to extract fault features
encoder (SAE) together, which achieves high diagnostic accuracy from monitoring data, and the extracted features were generated
and good anti-noise ability. The third is to generate the ex- by a global optimization GAN. The generated and the real fault
tracted data features [46,74]. The generated fault features can features were used for accurate fault identification by deep neural
5
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Table 1
The objective functions of the variants of GAN.
Name Objective function
LDCGAN
D = −Ex∼pr [log D (x)] − Ez ∼pz [log (1 − D (G (z )))]
DCGAN
LDCGAN
G = Ez ∼pz [log (1 − D (G (z )))]
LWGAN
D = −Ex∼pr [D (x)] + Ez ∼pz [D (G (z ))]
WGAN
LWGAN
G = −Ez ∼pz [D (G (z ))]
LWGAN_GP = LWGAN + λE(x,z )∼(Pr ,Pz ) (|∇ D (α x − (1 − α G (z )))| − 1)2
[ ]
D D
WGAN_GP
LWGAN_GP
G = LWGAN
G

LCGAN
D = −Ex∼pr [log D (x, c )] − Ez ∼pz [log (1 − D (G (z ) , c ))]
CGAN
LCGAN
G = Ez ∼pz [log (1 − D (G (z ) , c ))]
LACGAN
D = LDCGAN
D − Ex∼Pr [P (class = c |x )] − Ez ∼Pz [P (class = c |G (z ) )]
ACGAN
LACGAN
G = LDCGAN
G − Ez ∼Pz [P (class = c |G (z ) )]
LSSGAN
D = LWGAN
D − Ex∼Pr [P (class = c |x, c < k + 1 )]
SSGAN 2
+ Ex∼Pr f (x) − Ez ∼Pz f (G (z )) 

LSSGAN
G = LWGAN
G
infoGAN
= LDCGAN − λLI c ′ , c ′′
( )
LD D
infoGAN infoGAN DCGAN
− λLI c ′ , c ′′
( )
LG = LG

Table 2
Applications of GAN to generate data in S&I-IFD.

Data dimension Data types Models References

GAN/WGAN/WGAN-GP Zhang et al. [12], Liu et al. [27], Yin et al.


Raw signal [60], Gao et al. [61], Zhang et al. [62], Zhang
et al. [63]
ACGAN Shao et al. [52]
1-D
infoGAN Wu et al. [64]
GAN/WGAN/WGAN-GP Wang et al. [65], Zou et al. [66], Wang et al.
Frequency spectrum [67], Ding et al. [68], Mao et al. [69]
CGAN Wang et al. [70], Zheng et al. [71], Zheng
et al. [72]
ACGAN Li et al. [73]
Extracted feature GAN/WGAN/WGAN-GP Pan et al. [46], Zhou et al. [74]

GAN/WGAN/WGAN-GP Cabrera et al. [75]


2-D Time–frequency spectrum CGAN Liu et al. [26], Yu et al. [45]
SSGAN Liang et al. [76]

networks. Since the dimension of features is generally lower First, GAN is difficult to train. In order to generate sufficient fault
than that of raw data, the generation of data features is easier data, GAN consumes a large number of computing resources and
and faster than that of raw data. However, the fault information needs a long training time. Second, although GAN can expand
contained in the generated features may not be as rich as the one the volume of fault data, the data generation ability is limited
in the raw data, which is one of the drawbacks of fault feature when the training data is insufficient. Specifically, the original
generation. GAN needs massive data for training. The more training data
On the other hand, GAN is used for 2-D image generation is, the closer the data distribution learned by GAN is to the
originally, therefore, it is handy for processing 2-D data. In the real data distribution. However, when only a few training data
field of machine fault diagnosis, researchers usually use wavelet is available, it is easy to fall into mode collapse [55]. In this
transform (WT) and other methods [45,75,76] to obtain the time– case, the generated samples approximate the copies of the real
frequency domain features of raw signals, which are 2-D data. samples, which means that the fault information contained in the
GAN can generate time–frequency features of raw monitoring generated data is very limited. As a result, the fault identification
signals to serve the training of intelligent diagnosis models. Cabr- accuracy of the diagnosis model cannot meet the requirement of
era et al. [75] presented a deep diagnosis scheme based on engineering using such low-quality generated samples as training
GAN for imbalanced fault diagnosis, in which the 2-D time–
data. Therefore, despite many achievements have been yielded
frequency features are extracted using wavelet packet transform
using GAN, there is huge research space on how to reduce the
and augmented by GAN. Liang et al. [76] used continuous wavelet
consumption of computing time and improve the data generation
transform to extract time–frequency features of gearboxes’ vibra-
ability when the training data is insufficient.
tion data and a GAN was adapted to expanding the number of 2-D
time–frequency features to train the diagnosis model.
As a popular data generation method, GAN has the ability 3.2.2. VAE-Based methods
to generate faulty samples similar to the real faulty samples 3.2.2.1. Introduction to VAE. Variational Auto-Encoder (VAE) [50]
collected from engineering scenarios, thus expanding the training is another commonly used deep generative model, as shown in
dataset of the intelligent diagnosis model. However, there are Fig. 6. In terms of data generation, VAE can sample from hidden
still two problems when GAN is applied for fault data generation. variables and then generate more data.
6
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Fig. 6. Structure of VAE.

The input of the encoder is data x, the output is the hidden mode collapse [79]. However, due to the difference in the loss
variable z, which is composed of µ and σ , and the weights function, the data generated by VAE is usually not as real as
and biases of the encoder are θ . In the training, the posterior the data generated by GAN [80]. As a result, the application of
distribution qθ ( z | x) will be learned by the encoder. The hidden GAN to data augmentation is more popular than that of VAE [80].
variable z will be input into the decoder to reconstruct data, and Some scholars have tried to combine VAE and GAN to generate
the weights and biases of the decoder are ϑ . The distribution mechanical data [70]. In the future, how to make the data samples
pϑ ( x| z ) will be learned by the decoder. generated by VAE more real is a problem that needs to be solved.
The objective function can be expressed as
3.3. Data over-sampling using sampling techniques
Li (θ, ϑ) = −Ez ∼qθ ( z |xi ) [log (pϑ ( xi | z ))] + KL ( qθ ( z | xi )∥ p (z )) (4)
where p (z ) is the hidden variable’s prior distribution. KL (·) de- Although deep generative models like GAN and VAE can gener-
notes the Kullback–Leibler divergence. In VAE, p (z ) is the nor- ate fault data to support the training of intelligent diagnosis mod-
mal N (z ; 0, 1). qθ (z |xi ) is the normal distribution els, these deep generative models are often difficult to train and
( distribution require a large number of computing resources [51]. Taking into
N z ; µi , σi2 . Thus, the KL (·) between qθ ( z | xi ) and p (z ) can be
)
described as account this problem, data over-sampling using sampling tech-
niques is another important way to augment limited data [19].
KL qθ ( z | xi ) p (z )
(  )
Some sampling techniques like Synthetic Minority Over-sampling
J ( (( ) ) ( ) ( )2 ) Technique (SMOTE) [10], have yielded many achievements in
1∑ j
2 2
S&I-IFD.
=− 1 + log σi − µi − σij
j
(5)
2
j=1
3.3.1. SMOTE-Based methods
where J is the dimension of the hidden variable z. 3.3.1.1. Introduction to SMOTE. In general, researchers over-
In Eq. (5), µi and σi can be computed by the encoder directly. sample the minority classes or under-sample the majority classes
The hidden variable z is calculated by to balance the dataset [19]. However, under-sampling will lose
some valuable information that might be useful for data clas-
zi = µi + σi ε (6)
sification. On the other hand, over-sampling replicates training
where ε ∼ N (0, 1) is a noise variable, as given in Fig. 6. data randomly, which may lead to overfitting of classifiers [18].
In VAE, the output data has a high similarity to the input Based on the random over-sampling, an improved method named
because the data reconstruction loss is optimized in the training SMOTE is proposed [81]. By analyzing the samples in the minority
process. Meanwhile, due to the addition of the noise variable ε , classes, SMOTE can synthesize more new samples. As given in
the generated data will not be completely consistent with the Fig. 7, the process of SMOTE is described as follows:
input data, thus achieving data augmentation.
(1) The Euclidean distance between the sample x and all the
3.2.2.2. Applications of VAE to data generation. In intelligent fault samples in the same class is calculated to obtain the k-
diagnosis, VAE has been utilized to generate fault data of gear- nearest neighbors.
boxes [70] and bearings [25,78]. For example, in [70], a diagnosis (2) For each sample x, n samples {xi }ni=1 are randomly chosen
scheme based on VAE and GAN was proposed for imbalanced within the range of the k-nearest neighbors.
fault diagnosis, in which VAE was applied to generate the fre- (3) For each sample xi , the new synthesized sample xnew can
quency spectrums of gearbox in different working conditions. be obtained as follows:
Different from the traditional GAN, this scheme used VAE as the
data generator and further improved the data generation ability
of VAE through adversarial training. Dixit et al. [25] adopted a xnew = x + rand (0, 1) ∗ (xi − x) . (7)
Conditional Variational Auto-encoder (CVAE) to generate faulty
data of bearings, in which a centroid loss term was added to 3.3.1.2. Applications of SMOTE to data over-sampling. Some schol-
the original loss function of VAE. Zhao et al. [78] proposed an ars [28,47,48,82] have introduced SMOTE and its modified vari-
intelligent diagnosis model suitable for small and unbalanced ants to over-sample the machine faulty samples. For example,
monitoring data, in which a VAE was used to generate the vibra- Martin-Diaz et al. [10] used SMOTE to synthesize the fault sam-
tion signals of machines. The signals generated by VAE had high ples of induction motors (IMs), in which the stator current sig-
similarity to the real signals in terms of time–frequency domain, nals in the minority classes were synthesized to balance the
which made the proposed diagnosis method possible to obtain dataset. The results indicated that the balanced data constructed
higher accuracy than related works. by SMOTE could help to improve the diagnosis performance effec-
Similar to GAN, VAE can also be used for fault data gen- tively. An effective imbalanced data learning scheme named Easy-
eration, and the research achievements above have proved the SMT was presented in [82]. Easy-SMT used SMOTE to augment
effectiveness of VAE in S&I-IFD. Compared with GAN, the train- the minority fault classes of wind turbines and Easy-Ensemble
ing process of VAE is more stable, and there is no problem of algorithm to transfer the imbalanced fault classification problem
7
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

3.4.2. TrAdaBoost-based methods


The source domain samples and the target domain ones will
be reweighted by TrAdaBoost, so the contributions of the source
and the target domain samples to the diagnosis model training
can be balanced. In TrAdaBoost, if a target domain sample is
misclassified by the diagnosis model, the weight of this sample
will be increased because this sample is hard to be classified
correctly. On the other hand, if a source domain sample is mis-
classified by the diagnosis model, the weight of this sample will
be decreased because this sample is considered to be of little
help to the training of the diagnosis model. Consequently, the
classification boundary is moved to the direction of accurately
Fig. 7. New samples synthesized using SMOTE.
identified the target data, as given in Fig. 8. As a result, the
obtained diagnosis model based on TrAdaBoost algorithm will
have a good classification accuracy on the target diagnosis task.
to a balanced one, making it possible to achieve good diagno- In intelligent fault diagnosis, TrAdaBoost algorithm has been
sis performance. In [47], Wu et al. proposed an expectation– used to handle the small sample condition. For example, Xiao
maximization minority over-sampling method based on SMOTE, et al. [13] presented a transfer learning scheme for machine fault
in which a local-weighted strategy was applied to the expec- diagnosis under the small sample condition, in which a TrAd-
tation–maximization algorithm to learn and identify the hard-to- aBoost algorithm was applied to assign weights to each training
learn informative fault samples. sample. The weighted samples helped to train a convolutional
Compared with deep generative models, SMOTE requires neural network-based learner. The proposed scheme obtained
fewer computing resources, so it is able to synthesize a large the highest diagnosis accuracy compared with related works in
quantity of fault data samples to meet the demand of intelligent the case of inadequate target data. Shen et al. [29] applied the
diagnosis models. However, SMOTE has the problem of data dis- TrAdaBoost algorithm to update the weights of the selected auxil-
tributional marginalization when it is applied to synthesize data iary samples, the experimental results showed that the presented
in the minority class. Specifically, if a fault sample is at the edge work was effective in bearing fault identification using small
of the fault data distribution, the samples synthesized using this target data samples.
As a data reweighting algorithm, TrAdaBoost only operates
fault sample will also be at the edge of the distribution, which will
on the data and does not participate in feature extraction and
blur the classification boundary [83]. Therefore, despite SMOTE
conditions identification. Therefore, it is easy to be combined
improves the balance of the training dataset, it may increase the
with various advanced data classification models like deep be-
difficulty of fault classification when it falls into distributional
lief networks and convolutional neural networks. However, the
marginalization. performance of data reweighting is connected with the similarity
of the source and the target domain data distributions, if there
3.4. Data reweighting using transfer learning is a large deviation between them, the TrAdaBoost-based data
reweighting strategy may lead to negative transfer in the target
In addition to data generation and data over-sampling, data diagnosis task [8], which means the reweighted fault samples
augmentation can also be achieved by reweighting data samples may lead to a poor diagnosis performance.
using transfer learning-based approaches with the help of other
3.5. Epilog
related datasets [13,29,84].
This section reviews the research results using data aug-
3.4.1. Introduction to transfer learning mentation-based strategy in S&I-IFD. The data augmentation-
In the case of lacking fault data, it is difficult to train a new based strategy in S&I-IFD has three categories: data generation
intelligent diagnosis model [85]. However, this problem could be using generative model, data over-sampling using sampling tech-
solved if the existing diagnosis knowledge learned by the trained nique, and data reweighting using transfer learning. The first
diagnosis model could be reused. For example, we can use the two methods can expand the volume of fault data effectively.
bearing fault data collected in the laboratory to train a diagnosis However, they have the following two problems to be solved.
model. The bearing fault diagnosis knowledge learned by this First, deep generative models like GAN and VAE are often difficult
diagnosis model may be helpful for bearing fault identification to train and require many computing resources, which means
in engineering scenarios. Transfer learning, which means that the they are not friendly to practical application. Moreover, when
knowledge learned from one task is reused in another task, is a only a few samples are available for training, the generated faulty
samples’ quality is too low to meet the requirement of intelligent
promising tool for achieving this goal [86].
fault diagnosis models because these deep generative models
Generally speaking, transfer learning has three categories:
usually need massive data to learn an authentic data distribution.
instance-based transferring, feature-based transferring, and
Second, the sampling techniques represented by SMOTE have the
parameter-based transferring, depending on the components be-
problem of data marginal distribution, which may even increase
ing transferred [86]. Among them, instance-based transferring the difficulty of accurate fault classification. Based on transfer
aims to select some data samples from the source domain to learning, data reweighting can also augment limited fault data
improve the target task’s performance in the case of limited samples by increasing the selected data samples’ weights with
target samples. Data reweighting is one of the most commonly the help of other related datasets. However, data reweighting
used strategies of instance-based transferring. The weights of the relies on the similarity of the source and the target domain
selected target domain data samples will be increased while the data distributions, which is prone to reduce the performance
weights of the selected source domain ones will be decreased. of the diagnosis model. Therefore, it is necessary to find new
TrAdaBoost [87] is the most representative data reweighting data augmentation methods with high efficiency to improve the
algorithm in transfer learning. diagnosis performance on S&I-IFD further.
8
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Fig. 8. Illustration of TrAdaBoost: (a) the diagnosis model training with the source and the target domain samples directly, and (b) the diagnosis model training
based on TrAdaBoost algorithm.

4.2.1. DAE And DCNN-based methods


4.2.1.1. Introduction to DAE. As shown in Fig. 9, Auto-encoder
(AE) is a typical unsupervised model [8], which can reconstruct
input data through the operation of encoder and decoder. The
input is xi , we and be are the weight and bias of the encoding
layer. The data features of the hidden layer hi are expressed as

hi = fe (we · xi + be ) (8)

where fe is the activation function in the encoder network. The


weight and bias of the decoding layer are wd and bd , the recon-

structed data x i can be defined as

x i = fd (wd · hi + bd ) (9)

Fig. 9. Structure of AE. where fd is the activation function in the decoder network. By

minimizing the loss L (xi , x i ), the input data can be reconstructed
by AE.
4. Feature learning-based strategy for S&I-IFD n
⌢ 1∑
L (xi , x i ) = ∥xi − ⌢
x i ∥2 (10)
n
4.1. Motivation i=1

where n is the data points number.


In intelligent fault diagnosis, fault feature learning from ma- In the decoder network, the low-dimensional data hi is used
chine monitoring data is the core link. The quality of the learned ⌢
to reconstruct the high-dimensional data x i . Thus, hi can be
fault feature will affect the performance of machine fault diag- regarded as the features of input data xi . By stacking multi-
nosis to a great extent. In addition to data augmentation, the ple encoding layers and multiple decoding layers, DAE is con-
problem S&I-IFD can also be solved if diagnosis models can learn structed. Deep features of the input data can be collected using
effective fault features from small & imbalanced data. Scholars DAE through pre-training layer by layer, and the collected deep
have done many works on how to learn fault features from small features are available for data classification using classifiers like
& imbalanced data. According to the existing results, the research Softmax [12].
ideas are mainly divided into the following two kinds. First, by
designing regularized neural networks like sparse ones [23,30, 4.2.1.2. Introduction to DCNN. Compared to AE, convolutional
31], diagnosis models can extract fault features from small & neural network (CNN) has fewer training parameters and stronger
imbalanced data directly. Second, with the help of other related feature extraction ability [92]. CNN contains convolutional and
datasets, feature adaptation based on transfer learning can also pooling layers. The convolutional layer learns the feature vector
learn fault features from small & imbalanced data to achieve of the input data by convolution operation. As given in Fig. 10(a),
accurate fault identification [32–34]. in the mth convolutional layer, the convolution kernel k m ∈
ℜW ×D×H is used to learn the feature vector xm , where W is the
kernel number, D is the kernel depth. H represents the kernel
4.2. Feature extraction using regularized neural networks
height. The w th feature vector xm
w is obtained by
( )
The use of neural networks for fault feature extraction ∑
from monitoring data has been studied deeply. Recent research xw = σ
m m
k w,d × xm
d
−1 m
+ bw (11)
d
achievements show that regularized neural networks can process
small & imbalanced data effectively [88–91]. Moreover, in these where σ denotes the activation function. d = 1, 2, . . . , D, w =
achievements, deep auto-encoders (DAE) and deep convolutional 1, 2, . . . , W , xm
d
−1
is the dth feature vector in the m − 1th layer,
neural networks (DCNN) are favored as a basic model. and bw is the bias of the w th layer.
m

9
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Fig. 10. Illustration of CNN. (a) The convolution operation, and (b) the pooling operation.

On the other hand, the pooling layer plays a role of down- used a capsule network-based auto-encoder (CaAE) for intelligent
sampling, it can reduce the size of the feature vector and the fault identification of bearings, in which different local features
number of parameters, which is meaningful for accelerating con- were fused to construct the feature capsules. The feature capsules
vergence. Max pooling is the most common used pooling method, were input into a classifier for faults identification, and the exper-
as shown in Fig. 10(b), in the mth pooling layer, the max-pooling imental results showed that fused feature capsules were easier
is calculated by to obtain better diagnosis accuracies with small training samples
than independent local features. An ensemble convolutional neu-
w = down xw , s
xm m−1
( )
(12)
ral network (EnCNN) was proposed in [96] for imbalanced faults
where down (·) is the function of down-sampling, and s is the identification of machinery. In EnCNN, the imbalanced raw data
pooling size. were spilt into different training subsets to train a CNN-based
Similar to DAE, DCNN can also be built by stacking convolu- classifier, the classification results from multiple basic classifiers
tional and pooling layers. Benefiting from deep network structure, were integrated by voting strategy. The integrated results were
DCNN has stronger feature extraction capability than the shallow more conducive to realize accurate fault identification than a
CNN so the high-dimensional complex data can be processed single result in the case of imbalanced training data.
handily with DCNN. In summary, DAE and DCNN have powerful data processing
capability and can extract fault features from massive monitoring
4.2.1.3. Applications of regularized DAE and DCNN to feature extrac-
data automatically. However, such deep models update parame-
tion. DAE and DCNN with deep network structures often need
ters by minimizing empirical risk, which means they are prone
a large volume of data for training, so they are not suitable for
to overfitting when the training samples are insufficient [8].
processing small & imbalanced data directly. Fortunately, regu-
Although recent studies have shown that regularized networks
larization can help the training of DAE and DCNN with fewer
can improve their generalization ability, it must be noted that
training data while ensuring generalization ability. In intelligent
how to design high-quality regularization schemes for deep neu-
fault diagnosis, regularized neural networks can extract fault
ral networks is a difficult problem requiring a large amount of
features from a few fault samples and realize accurate fault
research experience because there are many choices of regu-
classification. There are three commonly used regularized neural
larization methods. Moreover, compared with the standard DAE
networks, i.e. sparse ones [23,30,31], normalized ones [24,93,94],
and DCNN, the regularized network structure is generally more
and ensemble ones [95–97]. Among them, sparse neural net-
works will reduce the parameters of the network to decrease complex and difficult to train due to the introduction of other
the risk of overfitting through weight decay, thus ensuring the factors such as sparse penalty term.
generalization ability with limited training data. For example,
Saufi et al. [31] presented a stacked sparse auto-encoder (SSAE) 4.2.2. Other algorithms-based methods
for gearbox fault diagnosis with limited fault data. Taking the In addition to regularized DAE and DCNN, other neural net-
Kullback–Leibler divergence as the sparse penalty term, the pa- works have also achieved some results in feature learning from
rameters to be trained in SSAE was reduced so diagnosis model small and imbalanced data [89–91,98–100]. For example, Geng
can achieve better generalization performance and higher di- et al. [98] presented a diagnosis method based on a residual
agnosis accuracy than other deep neural networks using fewer network with 17 convolutional layers for faults identification of
training samples. Second, normalized neural networks will reduce bogie under imbalanced data condition. The deep residual learn-
the adverse effect of data imbalance on the training process ing framework with stacked non-linear rectification layers made
by normalizing the weights, which ensures strong data classi- it possible to learn discriminative fault features from imbalanced
fication ability in the case of imbalanced data distribution. For Fast kurtogram images of mechanical signals. Liu et al. [99] used
example, normalized convolutional networks (DNCNN) were used the noise-assisted empirical mode decomposition for fault feature
in [94] for imbalanced bearing faults identification. By applying extraction from raw signals, and the extracted features were
a weights normalization strategy to construct the normalized input into an enhanced fuzzy network for faults classification.
convolutional and fully connected layer, the proposed DNCNN Qian et al. [100] proposed an imbalanced learning scheme based
reduced the negative impact of data imbalance on fault classifica- on sparse filtering for fault feature extraction, which introduced
tion. As a result, the proposed DNCNN was more effective in deal- a balancing matrix to balance the feature learning abilities of
ing with imbalanced fault classification than traditional CNNs. different classes. The results demonstrated that the presented
Finally, ensemble neural networks fuse data to prevent networks feature learning model was effective for bearing fault diagnosis. In
from overfitting in the case of small sample. In particular, there short, by modifying neural networks, fault features can be learned
are two kinds of fused data, i.e. the extracted features [23,95] from small and imbalanced data, which is an important means to
and the classification results [96,97]. For instance, Ren et al. [95] deal with S&I-IFD.
10
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

Fig. 11. Feature adaptation based on transfer learning.

4.3. Feature adaptation using transfer learning JDA is an improved variant based on TCA [102]. TCA only
adapts the marginal probability distribution, while JDA adapts not
In addition to extracting fault features directly, feature adapta- only the marginal probability but also the conditional probability
tion with the help of other related datasets is another important distribution between the source and the target domain data. As a
way to learn fault features from small and imbalanced data. In result, the optimization goal of JDA is described as
the transfer learning scenario, the volume of target domain data C
samples is usually much smaller than that in the source domain.

min trace(W T X L c X T W ) + λ ∥W ∥2F
Moreover, because of the difference between the source domain W (17)
c =0
and the target domain data distributions, their features are gen-
s.t.W XH X W = I
T T
erally different. Feature adaptation based on transfer learning
attempts to minimize the discrepancy between the feature distri- where c is the class information. And L c is
butions in the two domains, so the feature of the target domain
2, xi , xj ∈ XS ,c
⎧ 1
data can also be learned well by models, as shown in Fig. 11. ⎪

⎪ (n S ,c )
Not only transfer component analysis (TCA) [101], many achieve- ⎪
⎨ (nT ,c )2 ,
1
xi , xj ∈ XT ,c



ments in S&I-IFD have also been achieved by using joint distri-
bution adaptation (JDA) [102], deep neural networks (DNN) [34], (L c )ij = { (18)
x i ∈ XS , c , x j ∈ XT , c
− nS ,c1nT ,c ,

and other ways [103]. ⎪
xi ∈ XT ,c , xj ∈ XS ,c





0,

4.3.1. TCA And JDA-based methods otherwise
4.3.1.1. Introduction to TCA and JDA. TCA is a traditional feature where the sample number in class c from the source domain is
adaptation method [101]. When the source domain data X S has nS ,c and that from the target domain is nT ,c .
different distributions with the target domain data X T , a feature
mapping Φ is utilized to map them to high-dimensional Hilbert 4.3.1.2. Applications of TCA and JDA to feature adaptation. Some
spaces, where the target domain data has the minimized distance scholars introduced TCA and JDA to their transfer learning scheme
with the source domain data. for feature adaptation. For instance, Chen et al. [104] used a
The maximum mean discrepancy ( )(MMD) is ( used
) by TCA to transfer learning faults identification method for rolling bearings
calculate the distance between Φ X S and Φ X T , which is de- using a few faulty samples, in which TCA was applied for feature
scribed as follows adaptation to learn the transferable fault features from raw data.

 nS nT

 Xie et al. [105] and Duan et al. [106] extracted transferable
( T ))  1 ∑ 1 ∑ ( T ) fault feature from gearbox vibration signals using TCA, and the
dist Φ X , Φ X Φ Φ
( ( S) ( s)
= xi − x 
i  (13)
 nS
 i=1 nT experimental results showed that their models were effective for
i=1 
gearbox faults identification in the small sample case. Besides,
Han et al. [107] and Qian et al. [108] applied JDA for transferable
( T ) a kernel matrix K and L, the MMD between Φ X
S
( )
By using
and Φ X is rewritten to another form. features learning considering the problem of lacking target do-
⎧ main samples, the effectiveness of feature adaptation was verified
1
, xi , xj ∈ XS using a bearing dataset and a gearbox dataset respectively.
⎨ (nS )2
[ ] ⎪

Traditional TCA and JDA based feature adaptation approaches

K S ,S K S ,T
K = , Lij = 1
, xi , xj ∈ XT (14) are simple in the calculation and can reduce the discrepancy
K T ,S K T ,T ⎪
⎪ (nT )2 of the feature distributions in the two domains. However, both
⎩ − 1 ,

otherwise
nS nT TCA and JDA narrow the difference between two distributions by
dist Φ X , Φ X = trace (KL ) − λ ∗ trace (K ) mapping low-dimensional raw data to high-dimensional Hilbert
( ( S) ( T ))
(15)
space. When they meet complex high-dimensional mechanical
where λ is a tradeoff parameter. λ can be used to keep the balance data, they cannot fit them well. Thus, the diagnosis accuracy of
of distributions adaptation and parameters complexity. TCA and JDA related models on the complex diagnosis task is
Finally, the optimization goal of TCA can be described as usually poor.
min trace(W T KLKW ) + λ ∗ trace(W T W )
W (16) 4.3.2. Deep neural networks-based methods
s.t .W T KHKW = I m Different from TCA and JDA, deep neural networks can learn
data features from the original data samples directly by mini-
where H = I nS +nT − 1/ nS + nT 11T is a centering matrix, and
( )
mizing the distribution discrepancy of the target and the source
S +nT
1 ∈ Rn is an nS +nT dimensional column vector with elements domain features. As a basic distance metric of distribution dis-
of 1. crepancy, some scholars built deep transfer diagnosis models
11
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

based on Kullback–Leibler (KL) divergence to achieve feature training. Inspired by GAN, adversarial training can also reduce the
adaptation. For example, a transfer network was constructed by distribution discrepancy of two distributions. For example, Han
Qian et al. [109] for machine faults identification, in which a et al. [103] constructed an adversarial transfer learning model for
distribution discrepancy measuring metric named auto-balanced wind turbine fault diagnosis using limited training samples. In
KL divergence (AHKL) was developed for fault feature adaptation. the presented work, the feature descriptor composed of multiple
After feature extraction, the first and the higher-order moment convolutional layers extracted fault features from the samples in
discrepancies of the features from two domains was measured the two domains. The discrepancy of the two feature distributions
by AHKL, and the discrepancies between them were reduced by was minimized by a discriminative classifier through adversar-
⎡ ⎤ ial training. And the health conditions were output by a fault
N n
∑ ∑ classifier in the end. The proposed method was trained by
⎣µi · Li1 + 1i − µi · Lij ⎦ s.t .0 ≤ µi ≤ 1
) (
min (19)
n S
i=1 j=1 1 ∑ ( s s)
min S J yi , ỹi
where the data points number of each sample is N. The order θ n
i=1
moments number is n. The discrepancy vector of the nth order ⎡ S T

n n
moment is Ln . µi is a parameter vector to weigh between L1 and 1 ∑ ( s) 1 ∑ ( ( T ))
∑n −⎣ log Dθ xi + log 1 − Dθ xj ⎦ (22)
j=2 Lj . nS nT
i=1 j=1
In addition to KL divergence, another distance metric for mea-
suring distribution discrepancies is the maximum mean discrep- where the classification loss is the first term and the adversarial
ancy (MMD). Many research achievements based on feature adap- loss between the two feature distributions is the second term.
tation using deep neural networks have applied MMD to develop After adversarial training, the diagnosis model could also serve
their diagnosis scheme to deal with the small sample prob- well in the target diagnosis tasks.
lem [110]. For example, Li et al. [32] developed a deep balanced Due to the strong data processing ability, deep neural
feature adaptation model with multiple convolutional layers for networks-based feature adaptation approaches can usually out-
gearboxes fault diagnosis using limited labeled data samples. The put better diagnosis results than the traditional TCA and JDA.
fault features were extracted from raw data, and then MMD was Nevertheless, the feature adaptation ability depends on the dis-
applied to measure the discrepancy of the conditional and the tance metric sometimes. Besides, deep neural networks-based
marginal probability distributions of the extracted features. The feature adaptation schemes assume that the feature spaces of the
presented network was optimized by two domains overlap to some extent, however, existing studies
N
( n
) cannot tell whether there is overlap between them. The diag-
nosis models may perform poorly on the target diagnosis task if
∑ j

λDθ S
, T
+ (1 − λ) i
XCSi , XCTi
( ) ( ( ))
min XM XM Dθ (20)
θ the discrepancy of the feature distributions cannot be described
j=1 i=1
explicitly.
j
, XMT is the discrepancy of (the marginal
S
( )
where Dθ XM probability
4.4. Epilog
)
distribution in the jth network layer. Diθ XCS , XCT is the discrep-
i i
ancy of the conditional probability distribution in ith class. The The achievements on S&I-IFD using feature learning-based
network layers number and the class number are N and n. λ is a strategy are reviewed in this section, which are divided into
real number less than 1. To further improve the performance of two classes. The first is to use regularized neural networks like
feature adaptation, many variants based on the original MMD are sparse ones to extract fault features from limited fault data di-
proposed by scholars. For instance, Yang et al. [33] constructed rectly. The second is feature adaptation with the help of other
a convolutional adaptation scheme by minimizing multi-kernel related datasets based on transfer learning. Through feature adap-
MMD. A multi-layer MMD based feature adaptation framework tation, transferable fault features are expected to be learned by
was presented by Li et al. [34] to identify bearing faults using a diagnosis models to achieve accurate fault classification. How-
few faulty samples. ever, the feature learning-based strategy also has shortcomings.
Despite MMD is effective in measuring distribution discrep- First, since the fault information provided by a small number
ancy, the computational cost of MMD increases fast as the num- of fault data is always limited, the diagnosis performance im-
ber of samples increases. Compared with MMD, Wasserstein dis- proved by the feature learning-based models is also limited.
tance is a more reasonable distance metric when measuring Second, feature adaptation based on transfer learning requires
distribution discrepancy, which has also been used in the fea- the similarity of feature distributions between different datasets.
ture adaptation tasks. Cheng et al. [111] used a deep feature However, in engineering scenarios, it is difficult to construct
adaptation scheme for faults classification using a few labeled an auxiliary transferable dataset. Moreover, feature adaptation
target samples, in which Wasserstein distance was utilized to usually involves the selection of the distance metric, which makes
calculate the discrepancy between the target and the source do- it hard to achieve the optimal diagnosis results.
main features. The proposed method was trained by minimizing
the Wasserstein distance between the features from the two 5. Classifier design-based strategy for S&I-IFD
domains, which can be described as
n S n T 5.1. Motivation
1 ∑ ( ( s )) 1 ∑ ( ( T ))
min fL fθ xi − fL fθ xi (21)
θ nS nT In the process of intelligent fault diagnosis, fault identification
i=1 i=1
using a fault classifier is the last step. The classification perfor-
where fθ denotes the convolutional feature extractor. fL is the mance of the fault classifier is an important index to determine
Lipschitz function to satisfy the gradient constraint in calculating the fault identification accuracy. In the case of lacking fault data,
Wasserstein distance. The samples number in the source and the the trained classifier is usually over-fitted and the classification
target domain are nS and nT respectively. accuracy is low. If the fault classifier can be designed to have
In addition to minimizing distance metric, another way for strong generalization ability for small and imbalanced data, it
feature adaptation using deep neural networks is adversarial is hopeful to achieve accurate fault identification in the case
12
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

of lacking machine fault data. Scholars have also done a lot of


work on S&I-IFD from the perspective of fault classifier design.
According to whether the auxiliary datasets are used or not,
the design of the fault classifier follows two ideas. The first is
to use the small and imbalanced data to modify the original
fault classifier directly, such as constructing a cost-sensitive faults
classifier [38–40]. The second is to pre-train the classifier with
the help of other related datasets based on transfer learning to
achieve good classification performance [41,42,112].

5.2. Fault classifier design using small and imbalanced data

In this part, fault classifiers are designed based on small and


imbalanced data directly. As a specialized model for processing
small samples, support vector machine (SVM) [8] and its variants
can improve the faults classification accuracy with limited faulty
data samples [113]. Besides, cost-sensitive learning [19] is dedi-
cated to learning information from imbalanced data distributions Fig. 12. Illustration of SVM.
by applying the cost-sensitive loss function. The cost-sensitive
learning-based fault classifier can also provide an effective solu-
tion for S&I-IFD.
(LSSVM-CIL) with parameter regularization for the imbalanced
fault detection of aircraft engines, in which the size of sup-
5.2.1. SVM-based methods
port vectors was reduced and the representative fault samples
5.2.1.1. Introduction to SVM. SVM is a classical data classifier. As
were retained using recursive strategy. The experimental results
given in Fig. 12, SVM aims at finding a hyperplane in the features
proved that LSSVM-CIL was more effective than related methods
space, which is expected to correctly classify data samples as far in imbalanced fault detection. Based on the traditional SVM,
as possible. He et al. [118] presented a nonlinear support tensor machine
For a training dataset {xi , yi }M
i=1 , xi is the ith sample and the containing dynamic penalty factor (DC-NSTM) for faults identifi-
sample label is yi ∈ [1, −1]. The hyperplane H (x) can be de- cation of machines in the limited faulty samples case. A tensor
scribed as kernel function was added to the DC-NSTM so that it could
M
∑ process the nonlinear separable problem and improve the overall
H (x) = w · x + b = w · xi + b = 0 (23) classification accuracy with small training samples.
i=1 Generally, the SVMs based fault classifiers are optimized by
minimizing the overall structural risk of training samples [8],
where the parameters of H (x) are w and b. Moreover, to classify
so they are more suitable for dealing with the limited fault
the data samples into two classes (the positive one and the
data compared to deep neural networks, which are optimized by
negative one), H (x) should be subject to
minimizing the empirical risk. However, two drawbacks restrict
yi H (xi ) = yi (w · xi + b) ≥ 1, i = 1, 2, . . . , M . (24) the applications of SVM. First, the diagnosis accuracy of SVM is
sensitive to the setting of kernel parameters. How to choose a
As given in Fig. 12, H (x) and H (x) are the two hyperplanes
′ ′′
set of high-quality kernel parameters is one of the core issues
satisfying the constraints in Eq. (24). The distance from xi to H (x) when using the SVM-based fault classifier. Second, although SVM
can be calculated as di . is good at handling small sample problems, it is difficult to fit
yi (w · xi + b) massive monitoring data. With the development of data acqui-
di = . (25) sition technologies, the monitoring data of machines increases
∥w∥ rapidly, which will bring computing challenges to the SVM-based
Therefore, the margin γ between H ′ (x) and H ′′ (x) is ∥w∥
2
. As a fault classifier.
result, SVM will find the hyperplane H (x) between H (x) and

H ′′ (x), which can maximize the margin γ by optimizing the 5.2.2. Cost-sensitive classifier-based methods
5.2.2.1. Introduction to cost-sensitive learning. As a learning
objective loss function L.
paradigm, Cost-sensitive learning [19] will give different misclas-
yi (w · xi + b)
{ ( )} ( )
2 sification losses to different classes contained in a classification
L = arg max min = arg max . (26)
task. Cost-sensitive learning aims at reducing all misclassification
w,b ∥w∥ w,b ∥w∥
costs on the whole dataset. In other words, cost-sensitive learning
For the convenience of calculation, the loss function L is rewritten will give more attention to the samples in the minority classes
as follows: to improve the overall classification performance on imbalanced
1 datasets.
L = min ∥w∥2 Given a training dataset {xi , yi }M
w,b
2 . (27) i=1 containing M training sam-
ples, the ith sample is xi and the ith sample label is yi ∈
s.t.yi (w · xi + b) ≥ 1, i = 1, 2, . . . , M
[1, 2, . . . , K ]. Assume a misclassification loss Cu,v , which repre-
5.2.1.2. Applications of SVMs to faults classification. Some sents the loss or the penalty of misclassifying the sample xi in
researchers utilized SVM and its variants to classify limited fault class u to class v . For a classification task, the minimum misclas-
data [114–119]. For example, a K-means based SVM-tree and sification loss should be achieved when classifying the sample
SVM-forest were developed in [114], in which the K-means al- xi into a class. Specifically, the misclassification loss L ( u| xi ) of
gorithm was introduced to SVM for sensitive samples selection sample xi classified into class u can be described as
from an imbalanced dataset. The results indicated that the pre- K

sented network improved the diagnosis performance using a few L ( u| x i ) = P ( v| xi ) Cu,v (28)
faulty data samples. Xi et al. [116] proposed a least-squares SVM v=1
13
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

where P ( v| xi ) denotes the probability distribution of classifying TP TN
Gmean = ∗ (36)
the sample xi into the class v . Moreover, if u = v , Cu,v = 0, which TP + FN TN + FP
is the loss of classifying sample xi correctly. Therefore, the overall 
 n(k) (
expected misclassification cost on the training dataset {xi , yi }Mi=1 1  ∑ ⇀ )2
can be described as Ed = √ y i − P̂i (37)
n (k)
M K i=1
∑ ∑
L (C ) = P ( v| xi ) Cu,v . (29) where ri is the data imbalance ratio. True positive, false positive,
i=1 v=1 true negative and false negative are represented by TP, FP, TN,
Finally, the ideal classifier will make a decision by minimizing the and FN.
overall expected misclassification cost L (C ). On the whole, cost-sensitive learning pays more attention to
the fault samples in the minority classes through misclassification
5.2.2.2. Applications of cost-sensitive classifier to faults classifica- losses assignment, which ensures the fault identification accuracy
tion. In S&I-IFD, how to design and assign the misclassification of the minority fault samples. The output of the cost-sensitive
loss Cu,v is the key to the applications of cost-sensitive learning. fault classifier is sensitive to the design of the cost-sensitive
For an imbalanced dataset, the imbalance ratio is an important loss function. Most of the current research achievements set the
index to measure the imbalance degree. For a training dataset cost-sensitive loss function based on the data imbalance ratios,
{xi , yi }M
i=1 , the ith sample is xi and the ith sample label is yi ∈ which is indeed effective, but how to update it to obtain better
[1, 2, . . . , K ]. The imbalance ratio ru,v of the class u to the class v results is still worth exploring. In the future, one of the possible
is defined as solutions is to set the cost-sensitive loss function automatically
M using the attention mechanism [122], which has been applied in

n (k) = 1 {yi = k} (30) sensitive information selection and adaptive weights assignment
i=1 successfully.
n (v)
ru,v = (31) 5.3. Fault classifier design using transfer learning
n (u)
where n (k) represents the number of samples in class k and 1 {·}
In this part, fault classifiers are designed with the help of
is an indicator function returning 1 if yi = k and 0 otherwise. It is
other related datasets. In the transfer learning scenario, some
a common choice that the misclassification loss Cu,v is designed
model parameters can be shared by the target and the source
based on the data imbalance ratio, i.e., if u ̸ = v , Cu,v = ru,v ,
domain data [86]. Based on this, scholars use parameter transfer-
and if u = v , Cu,v = 0. By this design, the classification model
based approaches to design the classifier. After pre-training with
will pay more attention to the minority classes to improve the
the source domain data, the parameters of the faults classifier
identification accuracy of the minority classes. Many studies have
are fine-tuned using a few target domain samples. As a result,
shown the effectiveness of applying the imbalance ratio into
the design of the cost-sensitive loss function [94,98,120]. For the fine-tuned fault classifier will be expected to achieve high
example, Geng et al. [98] presented a diagnosis scheme using classification accuracies in the diagnosis tasks.
deep residual feature learning, in which the imbalance-weighted
cross-entropy (IWCE) was used for imbalanced fault classification. 5.3.1. Parameter transfer-based methods
The original cross-entropy (CE) can be described as In parameter transfer-based methods, the parameters of diag-
nosis models are first pre-trained using sufficient source domain
K
∑ ⇀ data. After that, the classification layers of the pre-trained mod-
CE = − y i log P̂i (32) els are fine-tuned using a few target domain data. The idea of
i=1 parameter transfer-based approaches is relatively simple, but it
⇀ is widely used. For example, Kim et al. [43] and Li et al. [123]
where the class number is K . y i is the one-hot vector represent-
constructed parameter transfer-based fault classifiers with deep
ing labels information and P̂i denotes the output of the softmax
convolutional neural network (DCNN), the parameters were pre-
classifier. Based on the original CE, IWCE used the data imbalance
trained using an existing dataset. After pre-training, the Softmax
ratios to weight the minority classes to enhance the samples’
classifier in the last layer of DCNN was fine-tuned using an-
influence in the minority classes.
other small dataset. The fine-tuned Softmax classifier was able
K
∑ ⇀ to classify data samples in the new dataset. The methods had
IWCE = − wi y i log P̂i (33) good diagnosis performance for bearings using small training
i=1 samples. Similarly, to identify gearbox faults using small training
where wi is a function just related to the data imbalance ratios. samples, Cao et al. [124] and Wu et al. [125] applied parameters
Besides, some researchers have combined the real-time classi- based transfer learning for fault classifier design with DCNN, the
fication results and the data imbalance ratios to design the cost- experimental results showed that the pre-trained fault classifiers
sensitive loss function because the real-time training results are could obtain high accuracy on target gearbox diagnosis tasks after
thought to be able to indicate the updating of parameters [38,39, fine-tuning.
121]. For instance, Dong et al. [38] adopted a cost-adaptive net- Besides, some scholars believe that updating all the model
work structure for imbalanced mechanical data classification, in parameters in the fine-tuning stage will be more helpful for
which the cost-sensitive loss function L was designed as follows accurate fault identification than just updating the classifier lay-
K ers. Therefore, the fault classifier can be obtained after global

parameters fine-tuning in this case. For example, He et al. [112]

L=− ti y i log P̂i (34)
applied a transfer learning model based on multi-wavelet deep
i=1
auto-encoder for the gearbox fault classifier design, in which all
where ti is a function related to the data imbalance ratios, the the model parameters were pre-trained using vibration data from
evaluation metric Gmean , and the Euclidean distance Ed . one working condition and fine-tuned with vibration data from
( ) ( )
Gmean 1 another working condition. After fine-tuning, the obtained clas-
ti = ri ∗ exp − ∗ exp − (35)
2 2Ed sifier could achieve high diagnosis accuracy in the new working
14
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

condition. Similarly, Li et al. [41] and He et al. [42] adopted deep augmentation ways can be applied [126–129]. For example, Yu
transfer auto-encoders to design the faults classifier of bearings, et al. [126] tried seven kinds of data augmentation strategies
and the fault classifiers were obtained after fine-tuning with a via hand-crafted rules to augment the vibration signals of rolling
few target domain samples. bearings, including local data reversing, local random reversing,
In general, the size of the source domain dataset will influence global data reversing, local data zooming, global data zooming,
the classification accuracies of the fault classifiers obtained using local segment splicing, and noise addition. Compared with other
parameter transfer-based approaches. The larger the source do- data augmentation strategies such as data generation, these data
main dataset used for pre-training is, the better the performance augmentation methods require less computing resources and less
of the obtained fault classifier is. However, it is difficult to con- computing time. Moreover, experimental results showed that
struct an ideal pre-training dataset in practice, which is one of the these data augmentation methods could also improve the diagno-
major problems in the application of parameter transfer-based sis performance of S&I-IFD significantly. Besides, the existing data
fault classifier design. If the source domain dataset is not large augmentation strategies are often tailor-made for each dataset
enough, the fault classifier obtained in this way will have poor and cannot be easily used in other datasets [17]. To address this,
diagnosis performance on the target diagnosis tasks. scholars proposed AutoAugment [130], which can automatically
learn a data augmentation strategy for neural network. Inspired
5.4. Epilog by this, the fault data samples augmented through AutoAugment
may provide a good solution for S&I-IFD. In addition to the data
This section reviews the achievements of dealing with S&I- augmentation methods mentioned above, some researchers used
IFD based on the classifier design strategy. According to whether semi-supervised learning-based models to select data samples
auxiliary datasets are used or not, fault classifier design-based with target labels from a large unlabeled dataset to expand
strategies have two ways. The first is designing fault classifiers the target dataset directly [131]. In engineering scenarios, the
using small and imbalanced data directly, such as optimizing SVM unlabeled monitoring datasets are easier to collect and usually
or developing cost-sensitive classifiers. This kind of method gen- have a larger size than the labeled datasets. Therefore, the use
erally depends on the engineering experience of the researchers, of unlabeled datasets is also helpful to expand the limited target
especially the design of cost-sensitive loss functions, so the op- datasets and improve the performance of diagnosis models.
timal results are difficult to be achieved. The second is to use Second, how to establish the samples quality evaluation in-
auxiliary datasets to pre-train diagnosis models and then fine- dexes is also an important issue. In [12] and [52], researchers
tuning the classifier with a few fault data to get the final fault used the Pearson correlation coefficient to evaluate the similarity
classifier. The performance of the fault classifier obtained in this of the generated data and the real data. However, the excessive
way depends on the quality of the auxiliary dataset. When the similarity of the generated and the real data will lead to infor-
auxiliary dataset is not large enough, the classification ability of mation redundancy, which has a very limited improvement on
the fault classifier is usually not strong enough. the generalization ability of the diagnosis models. Therefore, it is
not appropriate to evaluate the generated samples’ quality only
6. Future challenges and possible extensions for S&I-IFD from similarity. From the aspect of data augmentation, it is also
significant to establish a relatively objective and reliable evalu-
In the end, we try to discuss future challenges and provide ation index for the generated samples to improve the diagnosis
some possible extensions for S&I-IFD based on the existing re- performance of S&I-IFD.
search achievements.
6.2. How to prevent transfer learning-based approaches from nega-
6.1. How to improve the quality of the augmented samples in S&I- tive transfer in S&I-IFD?
IFD?
Among the three strategies, transfer learning-based approa-
Benefiting from new machine learning theories and technolo- ches account for a large proportion, so it is an important theory
gies like GAN and VAE, many existing achievements have proved for S&I-IFD. However, when negative transfer occurs, the transfer
that the performance of S&I-IFD can be improved by expanding learning-based models will perform poorly in the case of lacking
the size of the training samples set using data generation and data samples. Negative transfer refers to the case that the knowl-
over-sampling. However, by reviewing these research achieve- edge extracted in the source domain harms the target task [86].
ments, it can be found that the existing researches mainly focus Negative transfer will occur if the distribution discrepancies of
on expanding the size of fault data samples and lack attention to the target and the source domain data are too big. For example,
the quality of the samples. Specifically, when the size of training when the source domain data is the bearings faulty samples
samples is too small, the samples generated by generative mod- while the target data is the gears faulty samples, the knowledge
els are too similar to the real samples, which means the fault learned in the bearings faulty samples is meaningless or even has
information increased by this way is very limited. For the data a negative impact to the gears fault diagnosis. In addition, the
over-sampling models like SMOTE, the synthesized fault samples transferable components between the two domains are the foun-
have a strong linear relationship with the training samples due dation of transfer learning, like data samples, data features, or
to the problem of distributional marginalization [83]. Although model parameters. In some cases, although the data distributions
these generated samples can expand the size of training samples, in the two domains are similar, negative transfer may also occur
it is not clear how much fault information they can provide when the diagnosis model fails to find the components that can
for the training of the diagnosis models. If they cannot provide be transferred. For example, the physical structures of motors and
more fault information, the low-quality generated samples will generators are similar and their fault data distributions are also
have a limited improvement in the diagnosis performance of the similar. However, if the transfer learning-based diagnosis models
intelligent diagnosis models. cannot find the components that can be transferred, the diagnosis
In future researches, the authors believe that researchers need knowledge learned from the motor fault data is useless for the
to pay attention not only to the size of samples but also to generator fault diagnosis.
the quality of samples. First, in addition to data generation, It is a big challenge for S&I-IFD to avoid the negative trans-
data over-sampling, and data reweighting, more different data fer. First, to describe the discrepancies of the data distributions
15
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

in the target and the source domain, reasonable measurement At present, intelligent diagnosis models using meta-learning
rules need to be developed. In existing researches, most re- theory have not been deeply developed. The existing research
searchers rely on engineering experience to judge the similarity results are mainly based on relation network to build diagno-
of the data distributions in the two domains, however, it lacks sis models, however, Siamese network, matching network, and
a unified and effective standard. Therefore, developing a distri- prototypical network have not been applied yet. In addition to
bution similarity metric is worth exploring in future research. metric-based approaches, optimization-based and model-based
Second, to build effective diagnosis models, the idea of transi- approaches can achieve good results in image classification in
tive transfer learning is worth trying [132,133]. Different from the small sample case [138]. How to use them to build intelli-
traditional transfer learning methods, which involve only two gent diagnosis models is worthy of further exploration. Overall,
domains, transitive transfer learning connects multiple related meta-learning theory has great potential to solve the problem
domains and updates the learned knowledge in a transitive man- of S&I-IFD, so it is one of the important directions for future
ner, which may provide a feasible idea for the construction of research.
general transfer learning-based diagnosis models for S&I-IFD.
6.4. Zero-shot learning theory and its possible applications in S&I-
6.3. Meta-learning theory and its possible applications in S&I-IFD IFD

Meta-learning, or learning to learn, is an outstanding and new Zero-shot learning [147] may bring research breakthroughs
machine learning theory. The purpose of meta-learning is to im- in S&I-IFD. Zero-shot learning uses seen data, which has been
prove the learning level from data to tasks and enable algorithms collected in practice, for training and realizes the recognition
to obtain transferable knowledge from multiple tasks [134]. By of unseen data, which has not been collected. In engineering
training on various related tasks with few data, knowledge can be scenarios, most collected data are under normal conditions, fault
accumulated over several training episodes and used to the new data are rare. In extreme cases, researchers cannot obtain fault
but related task without fine-tuning [135], which makes meta- signals under a certain fault type or under a certain working
learning-based methods suitable for dealing with small sample condition, which means diagnosis models do not have training
problems. samples from unseen data classes. In intelligent fault diagnosis,
the recognition of the unseen data classes is a quite hard task,
Generally speaking, meta-learning has three categories:
which is difficult to accomplish using common diagnosis models.
optimization-based methods, model-based methods, and metric-
Zero-shot learning is a feasible way to recognize unseen data,
based methods [136]. Among them, optimization-based models
which is a valuable direction for further research in S&I-IFD.
aim at the learning of the meta-knowledge, which is the initializa-
Zero-shot learning realizes the recognition of unseen classes
tion parameters of the network, and then iterate them with a few
by inferring from seen classes to unseen classes [148], which
training samples to get good classifiers. Model-Agnostic Meta-
has been applied in image recognition widely. Zero-shot learning
Learning (MAML) [137] is the most famous meta-learning method
mainly includes model embedding [149] and feature genera-
based on optimization. Model-based methods are good at data-
tion [150], etc. Through training on seen classes, the model can
efficient few-shot learning [138]. They can embed the current
learn the mapping relationship between the data features and
training dataset into the activation condition and predict the test
their attributes, while the correlativity between the attributes
data based on this condition. Recurrent neural network [139],
and the data labels is predefined. Based on the learned map-
convolutional neural network [140], and hyper-network [141] are
ping relationships between the features and the attributes, the
the typical architectures of the model-based meta-learning. Fi-
model could infer the attributes of unseen classes in the testing
nally, metric-based methods are trained by comparing the train- stage and realize the recognition of unseen classes through the
ing datasets with the validation datasets. Siamese network [142], correlativity between the attributes and the data labels.
matching network [143], prototypical network [144], and relation In intelligent fault diagnosis, scholars have begun the prelim-
network [145] are typical meta-learning models based on metric. inary research on the zero-shot data classification. A zero-shot
On the whole, meta-learning-based models have two obvious diagnosis model using contractive auto-encoder was presented
characteristics. The first is that meta-learning-based models are in [151] to identify machine faults without faulty samples. Feng
trained through learning the task of ‘‘N-way K-shot’’, where the et al. [152] used a faults description model based on the attribute
classes number is N and the training samples number in each transfer strategy for the zero-sample fault classification of com-
class is K. Generally, K is small, which means meta-learning plex mechanical systems. Lv et al. [153] proposed a conditional
is suitable for the case of lacking fault samples in engineer- adversarial de-noising auto-encoder for machine fault identifica-
ing scenarios. The second is that meta-learning-based models tion without fault data, which generated unseen classes with the
have strong generalization ability. Some models like matching hybrid attribute as conditions.
network [146] can perform well in the classification task even At present, the research on intelligent diagnosis using zero-
containing new class data that have not been trained in the shot learning theory has obtained preliminary achievements in
training stage, which means meta-learning is good at dealing with the perspective of data attributes description and data features
actual problems in engineering scenarios. generation. In machine faults identification, the attributes of ma-
It is worth noting that some scholars have tried to apply meta- chine monitoring data are related to the monitoring object and
learning theory to solve the S&I-IFD problem and some prelimi- the data type. For example, due to the difference of fault form
nary results have been achieved. For example, Chang et al. [20] and fault mechanism, the attributes of induction motor moni-
presented a faults identification scheme for bearings in satellite toring data are different from that of generator monitoring data.
communication antenna, in which a meta-learning module based Moreover, for some complex equipment, such as aero-engines,
on relation network was applied to measure the correlation de- their monitoring data include pressure data, temperature data,
gree of vibration data so as to realize bearings faults identification flow data, vibration data, and so on. These different types of
using small sample. In [125], a meta-learning framework based monitoring data have different data attributes. Therefore, how to
on the meta-relation net was presented for machine fault diag- effectively describe data attributes according to different moni-
nosis. The experimental results showed that this meta-relation toring objects and data types is one of the key research directions
net-based model was suitable for fault classification with a few in the future, which is of great value to the application of zero-
training samples. shot learning-based diagnosis models. In addition, how to learn
16
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

and generate general data features is an important basis for zero- [4] Zhang K, Chen J, Zhang T, Zhou Z. A compact convolutional neural
shot learning. These existing research results are mainly based network augmented with multiscale feature extraction of acquired mon-
itoring data for mechanical intelligent fault diagnosis. J Manuf Syst 2020.
on auto-encoder to generate data features in unseen classes [151,
http://dx.doi.org/10.1016/j.jmsy.2020.04.016.
153]. In the future, how to use other models such as GAN [154] [5] Chang Y, Chen J, Qu C, Pan T. Intelligent fault diagnosis of wind
to achieve feature learning and generation is a necessary research turbines via a deep learning network using parallel convolution layers
direction. Generally speaking, zero-shot learning theory has a with multi-scale kernels. Renew Energy 2020. http://dx.doi.org/10.1016/
strong application value for machine fault diagnosis under small j.renene.2020.02.004.
[6] Pan T, Chen J, Zhou Z, Wang C, He S. A novel deep learning network
sample conditions. Although there have been some preliminary via multiscale inner product with locally connected feature extraction
research results, we think it still has a broad research space. for intelligent fault detection. IEEE Trans Ind Informatics 2019. http:
Therefore, how to design effective diagnosis models based on //dx.doi.org/10.1109/tii.2019.2896665.
zero-shot learning is an important research direction for S&I-IFD [7] Pan T, Chen J, Pan J, Zhou Z. A deep learning network via shunt-wound
restricted Boltzmann machines using raw data for fault detection. IEEE
in the future.
Trans Instrum Meas 2020. http://dx.doi.org/10.1109/TIM.2019.2953436.
[8] Lei Y, Yang B, Jiang X, Jia F, Li N, Nandi AK. Applications of machine
7. Conclusions learning to machine fault diagnosis: A review and roadmap. Mech Syst
Signal Process 2020;138:106587. http://dx.doi.org/10.1016/j.ymssp.2019.
S&I-IFD has attracted the attention of scholars for a long time. 106587.
[9] Hashmi MB, Majid MAA, Lemma TA. Combined effect of inlet air cooling
In this paper, we review the research achievements on S&I-IFD,
and fouling on performance of variable geometry industrial gas turbines.
which can be classified into three categories: data augmentation- Alexandria Eng J 2020. http://dx.doi.org/10.1016/j.aej.2020.04.050.
based strategy, feature learning-based strategy, and classifier [10] Martin-Diaz I, Morinigo-Sotelo D, Duque-Perez O, De Romero-Troncoso RJ.
design-based strategy. Specifically, data augmentation-based Early fault detection in induction motors using adaboost with im-
strategy improves the diagnosis performance on small &imbal- balanced small data and optimized sampling. IEEE Trans Ind Appl
2017;53:3066–75. http://dx.doi.org/10.1109/TIA.2016.2618756.
anced data by generating, over-sampling, or reweighting the [11] Gao L, Ren Z, Tang W, Wang H, Chen P. Intelligent gearbox di-
training data samples. Feature learning-based strategy learns the agnosis methods based on SVM, wavelet lifting and RBR. Sensors
fault features from small &imbalanced data using regularized 2010;10:4602–21. http://dx.doi.org/10.3390/s100504602.
neural networks or feature adaptation. Classifier design-based [12] Zhang T, Chen J, Li F, Pan T. A small sample focused intelligent fault
diagnosis scheme of machines via multi-modules learning with gradient
strategy achieves high diagnosis accuracy by designing the fault
penalized generative adversarial networks, vol. 0046. 2020, http://dx.doi.
classifier suitable for small &imbalanced data classification. org/10.1109/TIE.2020.3028821.
For future research, how to enhance the augmented samples’ [13] Xiao D, Huang Y, Qin C, Liu Z, Li Y, Liu C. Transfer learning with convolu-
quality is a problem that needs to be paid more attention to. Be- tional neural networks for small sample size problem in machinery fault
sides, how to prevent transfer learning-based diagnosis schemes diagnosis. Proc Inst Mech Eng Part C J Mech Eng Sci 2019;233:5131–43.
http://dx.doi.org/10.1177/0954406219840381.
from the negative transfer is a challenge for further applications [14] Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX. Deep learning and
in engineering scenarios. Finally, meta-learning theory and zero- its applications to machine health monitoring. Mech Syst Signal Process
shot learning theory have great potential in dealing with the 2019. http://dx.doi.org/10.1016/j.ymssp.2018.05.050.
S&I-IFD problem, which may bring research breakthroughs in the [15] Gangsar P, Tiwari R. Signal based condition monitoring techniques for
fault detection and diagnosis of induction motors: A state-of-the-art
future.
review. Mech Syst Signal Process 2020. http://dx.doi.org/10.1016/j.ymssp.
2020.106908.
Declaration of competing interest [16] Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. Learning
from class-imbalanced data: Review of methods and applications. Expert
The authors declare that they have no known competing finan- Syst Appl 2017;73:220–39. http://dx.doi.org/10.1016/j.eswa.2016.12.035.
[17] Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: A
cial interests or personal relationships that could have appeared survey on few-shot learning. ACM Comput Surv 2020;53. http://dx.doi.
to influence the work reported in this paper. org/10.1145/3386252.
[18] Sun Y, Wong AKC, Kamel MS. Classification of imbalanced data: A review.
Acknowledgments Int J Pattern Recognit Artif Intell 2009;23:687–719. http://dx.doi.org/10.
1142/S0218001409007326.
[19] He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data
The authors would like to sincerely thank all the anonymous Eng 2009. http://dx.doi.org/10.1109/TKDE.2008.239.
reviewers for the valuable comments that greatly helped to im- [20] Chang YH, Chen JL, He SL. Intelligent fault diagnosis of satellite com-
prove the manuscript. munication antenna via a novel meta-learning network combining with
This research is supported financially by the National Natu- attention mechanism. J Phys Conf Ser 2020;1510. http://dx.doi.org/10.
1088/1742-6596/1510/1/012026.
ral Science Foundation of China (No. 91960106, No. 51875436,
[21] Pan T, Chen J, Qu C, Zhou Z. A method for mechanical fault recognition
No. U1933101, No. 61633001, No. 51421004, No. 51965013), with unseen classes via unsupervised convolutional adversarial auto-
China Postdoctoral Science Foundation (No. 2020T130509, No. encoder. Meas Sci Technol 2020. http://dx.doi.org/10.1088/1361-6501/
2018M631145) and Shaanxi Natural Science Foundation, China abb38.
(No. 2019JM-041). [22] Govindan K, Jepsen MB. ELECTRE: A comprehensive literature review on
methodologies and applications. Eur J Oper Res 2016;250:1–29. http:
//dx.doi.org/10.1016/j.ejor.2015.07.019.
References [23] Yang J, Xie G, Yang Y. An improved ensemble fusion autoencoder model
for fault diagnosis from imbalanced and incomplete data. Control Eng
[1] Pan J, Zi Y, Chen J, Zhou Z, Wang B. LiftingNet: A Novel deep learning Pract 2020;98. http://dx.doi.org/10.1016/j.conengprac.2020.104358.
network with layerwise feature learning from noisy mechanical data [24] Zhao X, Jia M, Lin M. Deep Laplacian auto-encoder and its application into
for fault classification. IEEE Trans Ind Electron 2018;65:4973–82. http: imbalanced fault diagnosis of rotating machinery. Meas J Int Meas Confed
//dx.doi.org/10.1109/TIE.2017.2767540. 2020;152:107320. http://dx.doi.org/10.1016/j.measurement.2019.107320.
[2] Jiang W, Zhou J, Liu H, Shan Y. A multi-step progressive fault diagnosis [25] Dixit S, Verma NK. Intelligent condition based monitoring of rotary
method for rolling element bearing based on energy entropy theory and machines with few samples. IEEE Sens J 2020;1748:1. http://dx.doi.org/
hybrid ensemble auto-encoder. ISA Trans 2019;87:235–50. http://dx.doi. 10.1109/jsen.2020.3008177.
org/10.1016/j.isatra.2018.11.044. [26] Liu J, Qu F, Hong X, Zhang H. A small-sample wind turbine fault detection.
[3] Xiang Z, Zhang X, Zhang W, Xia X. Fault diagnosis of rolling bearing IEEE Trans Ind Informatics 2019;15:3877–88.
under fluctuating speed and variable load based on TCO spectrum and [27] Liu Q, Ma G, Cheng C. Data fusion generative adversarial network for
stacking auto-encoder. Meas J Int Meas Confed 2019;138:162–74. http: multi-class imbalanced fault diagnosis of rotating machinery. IEEE Access
//dx.doi.org/10.1016/j.measurement.2019.01.063. 2020;8:70111–24. http://dx.doi.org/10.1109/ACCESS.2020.2986356.

17
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

[28] Hang Q, Yang J, Xing L. Diagnosis of rolling bearing based on classification [51] Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B,
for high dimensional unbalanced data. IEEE Access 2019;7:79159–72. Bharath AA. Generative adversarial networks: An overview. IEEE Signal
http://dx.doi.org/10.1109/ACCESS.2019.2919406. Process Mag 2018. http://dx.doi.org/10.1109/MSP.2017.2765202.
[29] Shen F, Chen C, Yan R, Gao RX. Bearing fault diagnosis based on SVD [52] Shao S, Wang P, Yan R. Generative adversarial networks for data aug-
feature extraction and transfer learning classification. In: Proc. 2015 mentation in machine fault diagnosis. Comput Ind 2019;106:85–93. http:
progn. syst. heal. manag. conf.. 2016, http://dx.doi.org/10.1109/PHM.2015. //dx.doi.org/10.1016/j.compind.2019.01.001.
7380088. [53] Radford A, Metz L, Chintala S. Unsupervised representation learning with
[30] Zeng Y, Wu X, Chen J. Bearing fault diagnosis with denoising autoencoders deep convolutional generative adversarial networks. In: 4th int. conf.
in few labeled sample case. In: 2020 5th IEEE int conf big data anal.. 2020, learn. represent. ICLR 2016 - conf. track proc. 2016.
p. 349–53. http://dx.doi.org/10.1109/ICBDA49040.2020.9101321. [54] Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial
[31] Saufi SR, Bin ZA, Leong MS, Lim MH. Gearbox fault diagnosis using a networks. In: 34th int. conf. mach. learn. 2017.
deep learning model with limited data sample. IEEE Trans Ind Informatics [55] Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A. Improved
2020;16:6263–71. http://dx.doi.org/10.1109/TII.2020.2967822. training of wasserstein GANs. Adv Neural Inf Process Syst 2017.
[32] Li Q, Tang B, Deng L, Wu Y, Wang Y. Deep balanced domain adaptation [56] Mirza M, Osindero S. Conditional generative adversarial nets. 2014, p.
neural networks for fault diagnosis of planetary gearboxes with limited 1–7.
labeled data. Meas J Int Meas Confed 2020;156:107570. http://dx.doi.org/ [57] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary
10.1016/j.measurement.2020.107570. classifier gans. In: 34th int. conf. mach. learn. 2017.
[33] Yang B, Lei Y, Jia F, Xing S. A transfer learning method for intelligent [58] Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X.
fault diagnosis from laboratory machines to real-case machines. In: Proc. Improved Techniques for Training GANs. Adv Neural Inf Process Syst
- 2018 int. conf. sensing, diagnostics, progn. control. 2019, http://dx.doi. 2016.
org/10.1109/SDPC.2018.8664814. [59] Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P. Info-
[34] Li X, Zhang W, Ding Q, Sun JQ. Multi-layer domain adaptation method GAN: Interpretable representation learning by information maximizing
for rolling bearing fault diagnosis. Signal Process 2019. http://dx.doi.org/ generative adversarial nets. Adv Neural Inf Process Syst 2016.
10.1016/j.sigpro.2018.12.005. [60] Yin H, Li Z, Zuo J, Liu H, Yang K, Li F. Wasserstein generative adversarial
[35] Chen F, Tang B, Chen R. A novel fault diagnosis model for gearbox based network and convolutional neural network (WG-CNN) for bearing fault
on wavelet support vector machine with immune genetic algorithm. diagnosis. Math Probl Eng 2020;2020. http://dx.doi.org/10.1155/2020/
Meas J Int Meas Confed 2013;46:220–32. http://dx.doi.org/10.1016/j. 2604191.
measurement.2012.06.009. [61] Gao X, Deng F, Yue X. Data augmentation in fault diagnosis based on the
[36] Deng S, Lin SY, Chang WL. Application of multiclass support vector
wasserstein generative adversarial network with gradient penalty. Neu-
machines for fault diagnosis of field air defense gun. Expert Syst Appl
rocomputing 2020;396:487–94. http://dx.doi.org/10.1016/j.neucom.2018.
2011;38:6007–13. http://dx.doi.org/10.1016/j.eswa.2010.11.020.
10.109.
[37] Chen F, Tang B, Song T, Li L. Multi-fault diagnosis study on roller
[62] Zhang W, Li X, Jia XD, Ma H, Luo Z, Li X. Machinery fault diagnosis
bearing based on multi-kernel support vector machine with chaotic
with imbalanced data using deep generative adversarial networks. Meas J
particle swarm optimization. Meas J Int Meas Confed 2014;47:576–90.
Int Meas Confed 2020;152:152. http://dx.doi.org/10.1016/j.measurement.
http://dx.doi.org/10.1016/j.measurement.2013.08.021.
2019.107377.
[38] Dong X, Gao H, Guo L, Li K, Duan A. Deep cost adaptive convo-
[63] Zhang T, Chen J, Xie J, T. Pan. SASLN: Signals augmented self-taught
lutional network: A classification method for imbalanced mechanical
learning networks for mechanical fault diagnosis under small sample
data. IEEE Access 2020;8:71486–96. http://dx.doi.org/10.1109/ACCESS.
condition, vol. 9456. 2020, http://dx.doi.org/10.1109/TIM.2020.3043098.
2020.2986419.
[64] Wu J, Zhao Z, Sun C, Yan R, Chen X. Ss-infogan for class-imbalance
[39] Zhang C, Tan KC, Li H, Hong GS. A cost-sensitive deep belief network
classification of bearing faults. Procedia Manuf 2020;49:99–104. http:
for imbalanced classification. IEEE Trans Neural Networks Learn Syst
//dx.doi.org/10.1016/j.promfg.2020.07.003.
2019;30:109–22. http://dx.doi.org/10.1109/TNNLS.2018.2832648.
[65] Wang Z, Wang J, Wang Y. An intelligent diagnosis scheme based on
[40] Peng P, Zhang W, Zhang Y, Xu Y, Wang H, Zhang H. Cost sensitive
generative adversarial learning deep neural networks and its applica-
active learning using bidirectional gated recurrent neural networks for
tion to planetary gearbox fault pattern recognition. Neurocomputing
imbalanced fault diagnosis. Neurocomputing 2020;407:232–45. http://dx.
2018;310:213–22. http://dx.doi.org/10.1016/j.neucom.2018.05.024.
doi.org/10.1016/j.neucom.2020.04.075.
[41] Li X, Jiang H, Zhao K, Wang R. A deep transfer nonnegativity-constraint [66] Zou L, Li Y, Xu F. An adversarial denoising convolutional neural network
sparse autoencoder for rolling bearing fault diagnosis with few labeled for fault diagnosis of rotating machinery under noisy environment and
data. IEEE Access 2019;7:91216–24. http://dx.doi.org/10.1109/ACCESS. limited sample size case. Neurocomputing 2020;407:105–20. http://dx.
2019.2926234. doi.org/10.1016/j.neucom.2020.04.074.
[42] He Z, Shao H, Zhang X, Cheng J, Yang Y. Improved deep transfer auto- [67] Wang J, Li S, Han B, An Z, Bao H, Ji S. Generalization of deep neural
encoder for fault diagnosis of gearbox under variable working conditions networks for imbalanced fault classification of machinery using genera-
with small training samples. IEEE Access 2019;7:115368–77. http://dx. tive adversarial networks. IEEE Access 2019;7:111168–80. http://dx.doi.
doi.org/10.1109/access.2019.2936243. org/10.1109/access.2019.2924003.
[43] Kim H, Youn BD. A new parameter repurposing method for parameter [68] Ding Y, Ma L, Ma J, Wang C, Lu C. A generative adversarial network-
transfer with small dataset and its application in fault diagnosis of based intelligent fault diagnosis method for rotating machinery under
rolling element bearings. IEEE Access 2019;7:46917–30. http://dx.doi.org/ small sample size conditions. IEEE Access 2019;7:149736–49. http://dx.
10.1109/ACCESS.2019.2906273. doi.org/10.1109/ACCESS.2019.2947194.
[44] Chen J, Chang Y, Qu C, Zhang M, Li F, Pan J. Intelligent impulse finder: [69] Mao W, Liu Y, Ding L, Li Y. Imbalanced fault diagnosis of rolling bearing
A boosting multi-kernel learning network using raw data for mechanical based on generative adversarial network: A comparative study. IEEE
fault identification in big data era. ISA Trans 2020. http://dx.doi.org/10. Access 2019;7:9515–30. http://dx.doi.org/10.1109/ACCESS.2018.2890693.
1016/j.isatra.2020.07.039. [70] Ren Wang Y, Dong SunG, Jin Q. Imbalanced sample fault diagnosis of
[45] Yu Y, Tang B, Lin R, Han S, Tang T, Chen M. CWGAN: Conditional rotating machinery using conditional variational auto-encoder generative
wasserstein generative adversarial nets for fault data generation. In: IEEE adversarial network. Appl Soft Comput J 2020;92:106333. http://dx.doi.
int conf robot biomimetics. 2019, p. 2713–8. http://dx.doi.org/10.1109/ org/10.1016/j.asoc.2020.106333.
ROBIO49542.2019.8961501. [71] Zheng T, Song L, Guo B, Liang H, Guo L. An efficient method based on con-
[46] Pan T, Chen J, Xie J, Zhou Z, He S. Deep feature generating network: ditional generative adversarial networks for imbalanced fault diagnosis of
A new method for intelligent fault detection of mechanical systems rolling bearing. In: 2019 progn syst heal manag conf PHM-Qingdao, vol.
under class imbalance. IEEE Trans Ind Informatics 2020;3203:1. http: 2019. 2019, http://dx.doi.org/10.1109/PHM-Qingdao46334.2019.8942906.
//dx.doi.org/10.1109/tii.2020.3030967. [72] Zheng T, Song L, Wang J, Teng W, Xu X, Ma C. Data synthesis using
[47] Wu Z, Lin W, Fu B, Guo J, Ji Y, Pecht M. A local adaptive minority dual discriminator conditional generative adversarial networks for im-
selection and oversampling method for class-imbalanced fault diagnostics balanced fault diagnosis of rolling bearings. Meas J Int Meas Confed
in industrial systems. IEEE Trans Reliab 2019;1–12. http://dx.doi.org/10. 2020;158:107741. http://dx.doi.org/10.1016/j.measurement.2020.107741.
1109/TR.2019.2942049. [73] Li Z, Zheng T, Wang Y, Cao Z, Guo Z, Fu H. A novel method for imbalanced
[48] Zhang Y, Li X, Gao L, Wang L, Wen L. Imbalanced data fault diagnosis of fault diagnosis of rotating machinery based on generative adversarial
rotating machinery using synthetic oversampling and feature learning. J networks. IEEE Trans Instrum Meas 2020;9456:1. http://dx.doi.org/10.
Manuf Syst 2018;48:34–50. http://dx.doi.org/10.1016/j.jmsy.2018.04.005. 1109/tim.2020.3009343.
[49] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, [74] Zhou F, Yang S, Fujita H, Chen D, Wen C. Deep learning fault diagnosis
et al. Generative adversarial nets. Adv Neural Inf Process Syst 2014. method based on global optimization GAN for unbalanced data. Knowl-
[50] Kingma DP, Welling M. Auto-encoding variational bayes. In: 2nd int. conf. Based Syst 2020;187:104837. http://dx.doi.org/10.1016/j.knosys.2019.07.
learn. represent. ICLR 2014 - conf. track proc. 2014. 008.

18
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

[75] Cabrera D, Sancho F, Long J, Sanchez RV, Zhang S, Cerrada M, et al. Gener- [98] Geng Y, Wang Z, Jia L, Qin Y, Chen X. Bogie fault diagnosis under
ative adversarial networks selection approach for extremely imbalanced variable operating conditions based on fast kurtogram and deep residual
fault diagnosis of reciprocating machinery. IEEE Access 2019;7:70643–53. learning towards imbalanced data. Meas J Int Meas Confed 2020;166.
http://dx.doi.org/10.1109/ACCESS.2019.2917604. http://dx.doi.org/10.1016/j.measurement.2020.108191.
[76] Liang P, Deng C, Wu J, Yang Z, Zhu J, Zhang Z. Single and simultaneous [99] Liu S, Sun Y, Zhang L. A novel fault diagnosis method based on
fault diagnosis of gearbox via a semi-supervised and high-accuracy ad- noise-assisted MEMD and functional neural fuzzy network for rolling el-
versarial learning framework. Knowl-Based Syst 2020;198:105895. http: ement bearings. IEEE Access 2018;6:27048–68. http://dx.doi.org/10.1109/
//dx.doi.org/10.1016/j.knosys.2020.105895. ACCESS.2018.2833851.
[77] Pan T, Chen J, Xie J, Chang Y, Zhou Z. Intelligent fault identification [100] Qian W, Li S. A novel class imbalance-robust network for bearing
for industrial automation system via multi-scale convolutional generative fault diagnosis utilizing raw vibration signals. Meas J Int Meas Confed
adversarial network with partially labeled samples. ISA Trans 2020. http: 2020;156:107567. http://dx.doi.org/10.1016/j.measurement.2020.107567.
//dx.doi.org/10.1016/j.isatra.2020.01.014. [101] Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer
[78] Zhao D, Liu S, Gu D, Sun X, Wang L, Wei Y, et al. Enhanced data- component analysis. IEEE Trans Neural Netw 2011. http://dx.doi.org/10.
driven fault diagnosis for machines with small and unbalanced data based 1109/TNN.2010.2091281.
on variational auto-encoder. Meas Sci Technol 2019. http://dx.doi.org/10. [102] Long M, Wang J, Ding G, Sun J, Yu PS. Transfer feature learning with
1088/1361-6501/ab55f8. joint distribution adaptation. In: Proc. IEEE int. conf. comput. vis. 2013,
[79] Larsen ABL, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond http://dx.doi.org/10.1109/ICCV.2013.274.
pixels using a learned similarity metric. In: 33rd int conf mach learn ICML [103] Han T, Liu C, Yang W, Jiang D. A novel adversarial learning framework in
2016, vol. 4; 2016, pp. 2341–2349. deep convolutional neural network for intelligent diagnosis of mechanical
[80] Huang H, Yu PS, Wang C. An introduction to image synthesis with faults. Knowl-Based Syst 2019;165:474–87. http://dx.doi.org/10.1016/j.
generative adversarial nets. 2018, p. 1–17, ArXiv. knosys.2018.12.019.
[81] Elreedy D, Atiya AF. A comprehensive analysis of synthetic minority [104] Chen C, Li Z, Yang J, Liang B. A cross domain feature extraction method
oversampling technique (SMOTE) for handling class imbalance. Inf Sci based on transfer component analysis for rolling bearing fault diagnosis.
(Ny) 2019;505:32–64. http://dx.doi.org/10.1016/j.ins.2019.07.070. In: Proc. 29th Chinese control decis. conf.. 2017, http://dx.doi.org/10.1109/
[82] Wu Z, Lin W, Ji Y. An integrated ensemble learning model for imbalanced CCDC.2017.7978168.
fault diagnostics and prognostics. IEEE Access 2018;6:8394–402. http: [105] Xie J, Zhang L, Duan L, Wang J. On cross-domain feature fusion in gearbox
//dx.doi.org/10.1109/ACCESS.2018.2807121. fault diagnosis under various operating conditions based on transfer
component analysis. In: 2016 IEEE int. conf. progn. heal. manag. 2016,
[83] Soltanzadeh P, Hashemzadeh M. RCSMOTE: Range-controlled synthetic
http://dx.doi.org/10.1109/ICPHM.2016.7542845.
minority over-sampling technique for handling the class imbalance prob-
lem. Inf Sci (Ny) 2021;542:92–111. http://dx.doi.org/10.1016/j.ins.2020. [106] Duan L, Xie J, Wang K, Wang J. Gearbox diagnosis based on auxiliary mon-
07.014. itoring datasets of different working conditions. Zhendong Yu Chongji/J
Vib Shock 2017. http://dx.doi.org/10.13465/j.cnki.jvs.2017.10.017.
[84] Wu Z, Jiang H, Lu T, Zhao K. A deep transfer maximum classifier discrep-
ancy method for rolling bearing fault diagnosis under few labeled data. [107] Han T, Liu C, Yang W, Jiang D. Deep transfer network with joint distribu-
Knowl-Based Syst 2020;196:105814. http://dx.doi.org/10.1016/j.knosys. tion adaptation: A new intelligent fault diagnosis framework for industry
2020.105814. application. ISA Trans 2020. http://dx.doi.org/10.1016/j.isatra.2019.08.012.
[108] Qian W, Li S, Yi P, Zhang K. A novel transfer learning method for robust
[85] Hoang DT, Kang HJ. A bearing fault diagnosis method using transfer learn-
fault diagnosis of rotating machines under variable working conditions.
ing and Dempster-Shafer evidence theory. In: ACM int conf proceeding
Meas J Int Meas Confed 2019. http://dx.doi.org/10.1016/j.measurement.
ser. 2019, p. 33–8. http://dx.doi.org/10.1145/3388218.3388220.
2019.02.073.
[86] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng
[109] Qian W, Li S, Jiang X. Deep transfer network for rotating machine fault
2010. http://dx.doi.org/10.1109/TKDE.2009.191.
analysis. Pattern Recognit 2019;96. http://dx.doi.org/10.1016/j.patcog.
[87] Dai W, Yang Q, Xue GR, Yu Y. Boosting for transfer learning. In: ACM int.
2019.106993.
conf. proceeding ser. 2007, http://dx.doi.org/10.1145/1273496.1273521.
[110] Zhang Z, Chen H, Li S, An Z. Unsupervised domain adaptation via
[88] Zhang A, Li S, Cui Y, Yang W, Dong R, Hu J. Limited data rolling bearing enhanced transfer joint matching for bearing fault diagnosis. Meas J Int
fault diagnosis with few-shot learning. IEEE Access 2019;7:110895–904. Meas Confed 2020;165:108071. http://dx.doi.org/10.1016/j.measurement.
http://dx.doi.org/10.1109/access.2019.2934233. 2020.108071.
[89] Hu Y, Gao J, Zhou Q, Fan Z. Bearing fault diagnosis based on deep semisu- [111] Cheng C, Zhou B, Ma G, Wu D, Yuan Y. Wasserstein distance based
pervised small sample classifier. In: 2019 progn syst heal manag conf deep adversarial transfer learning for intelligent fault diagnosis with
PHM-Qingdao 2019. 2019, http://dx.doi.org/10.1109/PHM-Qingdao46334. unlabeled or insufficient labeled data. Neurocomputing 2020;409:35–45.
2019.8943025. http://dx.doi.org/10.1016/j.neucom.2020.05.040.
[90] T Wang, J Wang, Y Wu, X Sheng. A fault diagnosis model based on [112] He Z, Shao H, Wang P, Lin J, Cheng J, Yang Y. Deep transfer multi-wavelet
weighted extension neural network for turbo-generator sets on small auto-encoder for intelligent fault diagnosis of gearbox with few target
samples with noise. Chinese J Aeronaut 2020. http://dx.doi.org/10.1016/ training samples. Knowl-Based Syst 2020;191:105313. http://dx.doi.org/
j.cja.2020.06.024. 10.1016/j.knosys.2019.105313.
[91] Dong L, S LIU, H ZHANG. A method of anomaly detection and fault diag- [113] Zhang R, Liu Y. Research on development and application of support vec-
nosis with online adaptive learning under small training samples. Pattern tor machine - Transformer fault diagnosis. In: ACM Int Conf Proceeding
Recognit 2017;64:374–85. http://dx.doi.org/10.1016/j.patcog.2016.11.026. Ser. 2018, p. 262–8. http://dx.doi.org/10.1145/3305275.3305328.
[92] Jiao J, Zhao M, Lin J, Liang K. A comprehensive review on convolutional [114] Chen G, Ge Z. SVM-tree and SVM-forest algorithms for imbalanced fault
neural network in machine fault diagnosis. Neurocomputing 2020. http: classification in industrial processes. IFAC J Syst Control 2019;8:100052.
//dx.doi.org/10.1016/j.neucom.2020.07.088. http://dx.doi.org/10.1016/j.ifacsc.2019.100052.
[93] Zhao B, Zhang X, Li H, Yang Z. Intelligent fault diagnosis of rolling [115] Wagner C, Saalmann P, Hellingrath B. Machine condition monitoring
bearings based on normalized CNN considering data imbalance and and fault diagnostics with imbalanced data sets based on the KDD
variable working conditions. Knowl-Based Syst 2020;199:105971. http: process. IFAC-PapersOnLine 2016;49:296–301. http://dx.doi.org/10.1016/
//dx.doi.org/10.1016/j.knosys.2020.105971. j.ifacol.2016.11.151.
[94] Jia F, Lei Y, Lu N, Xing S. Deep normalized convolutional neural network [116] Xi PP, Zhao YP, Wang PX, Li ZQ, Pan YT, Song FQ. Least squares support
for imbalanced fault classification of machinery and its understanding vector machine for class imbalance learning and their applications to
via visualization. Mech Syst Signal Process 2018;110:349–67. http://dx. fault detection of aircraft engine. Aerosp Sci Technol 2019;84:56–74.
doi.org/10.1016/j.ymssp.2018.03.025. http://dx.doi.org/10.1016/j.ast.2018.08.042.
[95] Ren Z, Zhu Y, Yan K, Chen K, Kang W, Yue Y, et al. A novel model with [117] Malik H, Mishra S. Proximal support vector machine (PSVM) based
the ability of few-shot learning and quick updating for intelligent fault imbalance fault diagnosis of wind turbine using generator current
diagnosis. Mech Syst Signal Process 2020;138. http://dx.doi.org/10.1016/ signals. Energy Procedia 2016;90:593–603. http://dx.doi.org/10.1016/j.
j.ymssp.2019.106608. egypro.2016.11.228.
[96] Jia F, Li S, Zuo H, Shen J. Deep neural network ensemble for the [118] He Z, Shao H, Cheng J, Zhao X, Yang Y. Support tensor machine with
intelligent fault diagnosis of machines under imbalanced data. IEEE Access dynamic penalty factors and its application to the fault diagnosis of
2020;8:120974–82. http://dx.doi.org/10.1109/ACCESS.2020.3006895. rotating machinery with unbalanced data. Mech Syst Signal Process
[97] Xu K, Li S, Jiang X, An Z, Wang J, Yu T. A renewable fusion fault 2020;141:106441. http://dx.doi.org/10.1016/j.ymssp.2019.106441.
diagnosis network for the variable speed conditions under unbalanced [119] Duan L, Xie M, Bai T, Wang J. A new support vector data description
samples. Neurocomputing 2020;379:12–29. http://dx.doi.org/10.1016/j. method for machinery fault diagnosis with unbalanced datasets. Expert
neucom.2019.08.099. Syst Appl 2016;64:239–46. http://dx.doi.org/10.1016/j.eswa.2016.07.039.

19
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx

[120] Mathew J, Pang CK, Luo M, Leong WH. Classification of imbalanced data [137] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast
by oversampling in kernel space of support vector machines. IEEE Trans adaptation of deep networks. In: 34th int. conf. mach. learn. 2017.
Neural Networks Learn Syst 2018;29:4065–76. http://dx.doi.org/10.1109/ [138] Hospedales T, Antoniou A, Micaelli P, Storkey A. Meta-learning in neural
TNNLS.2017.2751612. networks: A survey. 2020, p. 1–20, ArXiv.
[121] Duan A, Guo L, Gao H, Wu X, Dong X. Deep focus parallel convolutional [139] Ravi S, Larochelle H. Optimization as a model for few-shot learning. In:
neural network for imbalanced classification of machinery fault diagnos- Proc. 5th int. conf. learn. represent. 2017.
tics. IEEE Trans Instrum Meas 2020;9456:1. http://dx.doi.org/10.1109/tim. [140] Mishra N, Rohaninejad M, Chen X, Abbeel P. A simple neural attentive
2020.2998233. meta-learner. In: 6th int. conf. learn. represent. ICLR 2018 - Conf. Track
[122] Chaudhari S, Polatkan G, Ramanath R, Mithal V. An attentive survey of Proc. 2018.
attention models. 2019, ArXiv. [141] Qiao S, Liu C, Shen W, Yuille A. Few-shot image recognition by predicting
[123] Li F, Chen J, Pan J, Pan T. Cross-domain learning in rotating machinery parameters from activations. In: Proc. IEEE comput. soc. conf. comput. vis.
fault diagnosis under various operating conditions based on parameter pattern recognit. 2018, http://dx.doi.org/10.1109/CVPR.2018.00755.
transfer. Meas Sci Technol 2020. http://dx.doi.org/10.1088/1361-6501/ [142] van der Spoel E, Rozing MP, Houwing-Duistermaat JJ, Eline Slagboom P,
ab6ade. Beekman M, de Craen AJM, et al. Siamese neural networks for one-shot
[124] Cao P, Zhang S, Tang J. Preprocessing-free gear fault diagnosis using image recognition. In: ICML - deep learn work 2015.
small datasets with deep convolutional neural network-based transfer [143] Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching
learning. IEEE Access 2018;6:26241–53. http://dx.doi.org/10.1109/ACCESS. networks for one shot learning. Adv. Neural Inf Process Syst 2016.
2018.2837621. [144] Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning.
[125] Wu J, Zhao Z, Sun C, Yan R, Chen X. Few-shot transfer learning Adv Neural Inf Process Syst 2017.
for intelligent fault diagnosis of machine. Meas J Int Meas Confed [145] Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM. Learning to
2020;166:108202. http://dx.doi.org/10.1016/j.measurement.2020.108202. compare: Relation network for few-shot learning. In: Proc. IEEE comput.
[126] Yu K, Lin TR, Ma H, Li X, Li X. A multi-stage semi-supervised learning soc. conf. comput. vis. pattern recognit. 2018, http://dx.doi.org/10.1109/
approach for intelligent fault diagnosis of rolling bearing using data CVPR.2018.00131.
augmentation and metric learning. Mech Syst Signal Process 2021;146. [146] Zhang K, Chen J, Zhang T, He S, Pan T, Zhou Z. Intelligent fault diagnosis
http://dx.doi.org/10.1016/j.ymssp.2020.107043. of mechanical equipment under varying working condition via itera-
[127] Li X, Zhang W, Ding Q, Sun JQ. Intelligent rotating machinery fault tive matching network augmented with selective signal reuse strategy.
diagnosis based on deep learning using data augmentation. J Intell Manuf J Manuf Syst 2020;57:400–15. http://dx.doi.org/10.1016/j.jmsy.2020.10.
2020;31:433–52. http://dx.doi.org/10.1007/s10845-018-1456-1. 007.
[128] Lv H, Chen J, Zhang T, Hou R, Pan T, Zhou Z. SDA: Regularization with cut- [147] Lampert CH, Nickisch H, Harmeling S. Learning to detect unseen object
flip and mix-normal for machinery fault diagnosis under small dataset. classes by between-class attribute transfer. In: 2009 IEEE comput. soc.
ISA Trans 2020. http://dx.doi.org/10.1016/j.isatra.2020.11.005. conf. comput. vis. pattern recognit. work. CVPR work. 2009. 2009, http:
[129] Han S, Oh J, Jeong J. Bearing fault detection with data augmentation based //dx.doi.org/10.1109/CVPRW.2009.5206594.
on 2-d CNN and 1-d CNN. In: ACM int conf proceeding ser. 2020, p. 20–3. [148] Romera-Paredes B, Torr PHS. An embarrassingly simple approach to
http://dx.doi.org/10.1145/3421537.3421546. zero-shot learning. In: 32nd int. conf. mach. learn. 2015.
[130] Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: Learning [149] Changpinyo S, Chao WL, Sha F. Predicting visual exemplars of unseen
augmentation strategies from data. In: Proc. IEEE comput. soc. conf. classes for zero-shot learning. In: Proc. IEEE int. conf. comput. vis. 2017,
comput. vis. pattern recognit. 2019, http://dx.doi.org/10.1109/CVPR.2019. http://dx.doi.org/10.1109/ICCV.2017.376.
00020. [150] Xian Y, Lorenz T, Schiele B, Akata Z. Feature generating networks for
[131] Tao X, Ren C, Li Q, Guo W, Liu R, He Q, et al. Bearing defect diagnosis zero-shot learning. In: Proc. IEEE comput. soc. conf. comput. vis. pattern
based on semi-supervised kernel local Fisher discriminant analysis using recognit. 2018, http://dx.doi.org/10.1109/CVPR.2018.00581.
pseudo labels. ISA Trans 2020. http://dx.doi.org/10.1016/j.isatra.2020.10. [151] Gao Y, Gao L, Li X, Zheng Y. A zero-shot learning method for fault
033. diagnosis under unknown working loads. J Intell Manuf 2020. http:
[132] B Tan, Y Song, E Zhong, Q. Yang. Transitive transfer learning. In: Proc. //dx.doi.org/10.1007/s10845-019-01485-w.
ACM SIGKDD int. conf. knowl. discov. data min. 2015, http://dx.doi.org/ [152] Feng L, Zhao C. Fault description based attribute transfer for zero-sample
10.1145/2783258.2783295. industrial fault diagnosis. IEEE Trans Ind Informatics 2020. http://dx.doi.
[133] Tan B, Zhang Y, Pan SJ, Yang Q. Distant domain transfer learning. In: 31st org/10.1109/tii.2020.2988208.
AAAI conf. artif. intell. 2017. [153] Lv H, Chen J, Pan T, Zhou Z. Hybrid attribute conditional adversarial de-
[134] Mai S, Hu H, Xu J. Attentive matching network for few-shot learning. noising autoencoder for zero-shot classification of mechanical intelligent
Comput Vis Image Underst 2019;187:102781. http://dx.doi.org/10.1016/j. fault diagnosis. Appl Soft Comput J 2020. http://dx.doi.org/10.1016/j.asoc.
cviu.2019.07.001. 2020.106577.
[135] Ali AR, Gabrys B, Budka M. Cross-domain meta-learning for time-series [154] Xian Y, Lorenz T, Schiele B, Akata Z. Feature generating networks for
forecasting. Procedia Comput Sci 2018;126:9–18. http://dx.doi.org/10. zero-shot learning. In: Proc. IEEE comput. soc. conf. comput. vis. pattern
1016/j.procS.2018.07.204. recognit. 2018, http://dx.doi.org/10.1109/CVPR.2018.00581.
[136] Lee Y, Choi S. Gradient-based meta-learning with learned layerwise
metric and subspace. In: 35th int. conf. mach. learn. 2018.

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy