Mathematics 10 02066
Mathematics 10 02066
Article
LSTM-Based Broad Learning System for Remaining Useful
Life Prediction
Xiaojia Wang 1, * , Ting Huang 1 , Keyu Zhu 1 and Xibin Zhao 2
Abstract: Prognostics and health management (PHM) are gradually being applied to production
management processes as industrial production is gradually undergoing a transformation, turning
into intelligent production and leading to increased demands on the reliability of industrial equipment.
Remaining useful life (RUL) prediction plays a pivotal role in this process. Accurate prediction results
can effectively provide information about the condition of the equipment on which intelligent
maintenance can be based, with many methods applied to this task. However, the current problems
of inadequate feature extraction and poor correlation between prediction results and data still affect
the prediction accuracy. To overcome these obstacles, we constructed a new fusion model that extracts
data features based on a broad learning system (BLS) and embeds long short-term memory (LSTM)
to process time-series information, named as the B-LSTM. First, the LSTM controls the transmission
of information from the data to the gate mechanism, and the retained information generates the
mapped features and forms the feature nodes. Then, the random feature nodes are supplemented by
an activation function that generates enhancement nodes with greater expressive power, increasing
the nonlinear factor in the network, and eventually the feature nodes and enhancement nodes are
jointly connected to the output layer. The B-LSTM was experimentally used with the C-MAPSS
Citation: Wang, X.; Huang, T.; Zhu, dataset and the results of comparison with several mainstream methods showed that the new model
K.; Zhao, X. LSTM-Based Broad achieved significant improvements.
Learning System for Remaining
Useful Life Prediction. Mathematics
Keywords: remaining useful life (RUL) prediction; broad learning system (BLS); long short-term
2022, 10, 2066. https://doi.org/
memory (LSTM); feature extraction
10.3390/math10122066
based on a particular piece of equipment and combined with the empirical knowledge of
that equipment and the defect growth equation [4,5] to predict the RUL of the equipment.
A model-based prognostic method was developed to overcome the influence of the number
of sensors on the prediction results. The method is not only an innovation in prediction
methods, but also demonstrated the superiority of the approach in reducing the sensor
set [6]. Model-based approaches have been shown to be robust in limited sensing scenes.
In addition, a method was proposed for online evaluation in cases where little is known
about the degradation process and extreme cases are considered: the entire degradation
process from start of operation to failure is not observed [7]. El Mejdoubi et al. [8] considered
aging conditions in predicting the RUL of supercapacitors where the posterior values of
capacitance and resistance are predicted by means of particle filters. Gears are important
transmission components and accurate RUL prediction is very important to determine the
condition of gearing systems. The accuracy of prediction using the digital twin method was
significantly improved due to its comprehensive health indicators [9]. However, physics-
based methods require corresponding degradation models based on specific objects and
usually are not universal. In addition, as the complexity of the equipment increases, it
becomes difficult to model the failure of system objects, limiting the development of RUL
prediction methods by model construction.
There are two important branches of data-driven methods, namely, statistical data-
driven methods and machine-learning(ML)-based methods, that are the current mainstream
methods for RUL prediction [10]. Statistical data-driven approaches are used to predict
system status based on monitoring data through statistical models without making as-
sumptions or empirical estimates of physical parameters. Park and Padgett [11] provided a
new model of accelerated degeneracy, mainly for faults in geometric Brownian motion or
gamma processes, with approximation operations using Birnbaum–Saunders and inverse
Gaussian distributions. Chehade and Hussein [12] proposed a multioutput convolutional
Gaussian process (MCGP) model that captures the cross-correlation between the capacities
of available battery cells and is very effective for long-term capacity prediction of lithium-
ion (Li-ion) batteries. Van Noortwijk et al. [13] proposed a method that combines two
stochastic processes to assess reliability over time. In [14], a degradation model based on
the Wiener process and using recursive filters to update the drift coefficients was developed
to predict the RUL. The prediction accuracy of physics-based methods depends on the
choice of degradation model, but the degradation models are distinctive for different types
of equipment. By contrast, statistical data-driven methods are valid in overcoming the
problems associated with model selection.
Recently, ML has matured in applications such as data mining, speech recognition,
computer vision, fault diagnosis and RUL prediction due to its powerful data processing
capabilities. ML-based prediction methods can overcome the problem of unknown degra-
dation models, as the input is not limited by the type of data but can be many different
types of data. ML used to predict RUL can be divided into shallow ML and deep learning
(DL) methods. The common shallow ML prediction methods are back-propagation (BP),
extreme learning machines (ELMs), support vector machines (SVMs) and relevance vector
machines (RVMs). BP-based neural networks have good long-term predictive capabilities.
Gebraeel et al. [15] established neural-network-based models to train the vibration data of
the bearings to obtain the expected failure time of the bearings. Since a single BP neural
network faces the problem of the weights falling into local optima and slow convergence
during training, some approaches combining other methods with BP algorithms have
been proposed. In [16], Wang et al. predicted the distribution of RUL of cooling fans by
building a time-series ARIMA model; the combination with a BP neural network model
improved the feature extraction ability of the model and improved the prediction accuracy.
ELMs have features such as fast learning speed and high generalization ability, and these
advantages are used in RUL prediction to feed the extracted features into ELM models for
training, thus improving the prediction accuracy [17]. Maior et al. [18] presented a method
combining empirical mode decomposition and SVM for degradation data analysis and
Mathematics 2022, 10, 2066 3 of 13
RUL prediction. Improving the accuracy of prediction under uncertainty is a problem that
urgently needs to be solved. Wang et al. [19] extended the RVM to the probability manifold
to eliminate the negative impact of the RVM evidence approximation and underestimation
of hyperparameters on the prediction. Although some studies corroborate the effectiveness
of shallow ML in the field of RUL prediction, traditional shallow ML algorithms rely heavily
on the prior knowledge of experts and signal processing techniques, making it difficult to
automatically process and analyze large amounts of monitoring data.
By contrast, DL models aim to build deep neural network architectures that com-
bine low-dimensional features of the data to form more abstract high-level attributes with
strong feature learning capabilities. In 2006, the greedy layer-wise pretraining method
was proposed achieved a theoretical breakthrough in DL [20]. Subsequently, DL has had a
wide range of applications in several fields, such as image recognition [21], speech recog-
nition [22], fault diagnosis [23] and RUL prediction [24]. Deutsch et al. [25] combined
the feature extraction capabilities of DBNs with the superior predictive capabilities of
feedforward neural networks (FNNs) in predicting the RUL of rotating equipment. Based
on this approach, to obtain the probability distribution of the remaining lifetime, DBN was
effectively combined with particle filtering to further improve the prediction accuracy [26].
Deep neural networks (DNNs) have poor long-term prediction accuracy and need to be
combined with other methods for better performance. A convolutional neural network
(CNN) is a classical feedforward neural network with excellent characteristics such as pa-
rameter sharing and spatial pooling. In [27], a deep convolutional neural network (DCNN)
and time window approach were utilized for sample preparation and demonstrated the
extraction of more efficient features. To facilitate the fusion of comprehensive information,
a RUL prediction method was proposed that learns salient features automatically from
multiscale convolutional neural networks (MSCNN) and reveals the nonsmoothness of
bearing degradation signals through time–frequency representation (TFR) [28]. In contrast
to CNNs, recurrent neural networks (RNNs) are feedforward neural networks contain-
ing feedforward connections and internal feedback connections. Their special network
structure allows the retention of data information of the implicit layer at the previous
moment to be preserved and is often used to process monitoring vector sequences with
interdependent properties. Heimes [29] realized the prediction of the RUL based on the
RNN structure. However, due to the problems of vanishing and exploding gradients,
RNNs processing long-term monitoring sequences produce large prediction bias [30]. To ef-
fectively address long-term sequence problems, LSTM was used on top of RNN, which
made some improvements and allowed the gate structure to determine the information
features passed under optimal conditions [31]. Zhao et al. [32] constructed a hybrid model
based on the capsule neural network and long short-term memory network (Cap-LSTM)
to extract multivariate time-series sensor data, where the model is feature sensitive and
feature information is fully utilized resulting in improved prediction accuracy. A number
of variants have been proposed based on typical LSTM networks. The attention mechanism
can highlight key parts of time-series information and improve accuracy when predicting.
The local features of the original signal sequence were extracted by a one-dimensional
convolutional neural network, combined with a LSTM network and attention mechanism
to analyze sensor signals and predict RUL, improving the robustness of the model and
obtaining higher prediction accuracy [33]. DL is widely used for RUL prediction due to
its feature representation being stronger than shallow ML and its ability to handle large
amounts of data.
In addition, hybrid approaches based on physics-based and data-driven approaches
were developed, such as the method of Sunet et al. [34], where empirical model decomposi-
tion, Wiener processes and neural networks were combined to take full advantage of both
physical models and data-driven approaches. However, it is not easy to design a structure
that reflects the advantages of both methods; thus, the use of hybrid methods to predict
RUL is uncommon.
Mathematics 2022, 10, 2066 4 of 13
Although current ML algorithms perform well in the field of RUL prediction, and in
particular LSTM is effective in handling time-series data, there are still some drawbacks to
overcome when applying ML to RUL prediction. First of all, the existing methods suffer
from inadequate feature representation in RUL prediction, which affects their accuracy.
Secondly, the existing prediction models have to reconstruct the whole model and retrain
the parameters when a new data input is available, which is less efficient. To address this
problem, a new LSTM-based BLS algorithm is proposed. On the one hand, the BLS has
powerful feature representation and prediction capabilities and can accurately represent the
relationship between data characteristics and predicted outcomes. Meanwhile, compared
with DL, the BLS has a simple structure, a high training speed and the advantage of
incremental learning. When the network does not reach the expected performance, only
incremental learning is required and only the incremental part needs to be computed
without rebuilding the entire network. This significantly improves the efficiency of data
processing. In addition, LSTM can effectively process time-series data and avoid problems
such as parameter setting and single-time prediction randomness. On the other hand,
we hope to broaden the theoretical study of BLS networks by constructing a new fusion
network and applying it to practical production scenarios to create economic benefits.
We propose a method for predicting RUL that takes into account both feature extraction
and time-series information. It is hoped that sufficient feature extraction can improve the
prediction performance, and appropriately broaden the theory and application of the BLS.
Specifically, the main contributions and innovations of the work we have conducted are
listed below:
(1) A new LSTM-based BLS prediction method is proposed to extract the time-series
features of the data based on feature extraction, improving the ability of the prediction
results to represent the data features and enhancing the RUL prediction accuracy.
(2) The mechanism of model construction represents another innovation. Instead of
directly splicing the two methods, the new method is embedded by modifying the
internal structure and avoiding the redundancy of the model.
(3) The adaptation on the basis of a BLS enriches the practical significance of the BLS
framework, extends the scope of theoretical research and enables the achievement of
better results by integrating the BLS with other methods.
The rest of the paper is structured as follows. The constructed B-LSTM model and the
required related basics are introduced and presented in Section 2. Section 3 presents the
experimental data required and the experimental design. Section 4 applies the dataset for
validation of the model performance and comparison with other methods. A summary of
the model and possible future research directions are shown in Section 5.
2. Related Work
2.1. Broad Learning System (BLS)
With the continuous development of deep learning, deep networks are widely used in
various research fields, but the disadvantages are also more obvious. In order to achieve
higher accuracy, the number of network layers has to be gradually increased; however,
this consumes more computational resources and causes overfitting to occur in small
sample data processing. A BLS is built based on a single hidden layer neural network
and uses lateral scaling to improve accuracy and avoid complex hyperparameters. The
unique feature node and enhancement node structure also provides a strong guarantee for
adequate feature extraction.
The BLS was established by C. L. Philip Chen on the basis of a random vector
functional-link neural network (RVFLLNN) and compensated for its shortcomings in
handling large-volume and time-varying data [35]. In addition, the multiple variants
proposed in the course of subsequent research showed flexibility, stability and remarkable
results in classification and regression of semi-supervised and unsupervised tasks [36]. The
structure of the BLS is shown in Figure 1. The network structure is constructed using the
following steps. First, the mapping of input data to feature nodes is established, and then
the course of subsequent research showed flexibility, stability and remarkable results in
classification and regression of semi-supervised and unsupervised tasks [36]. The struc-
ture of the BLS is shown in Figure 1. The network structure is constructed using the fol-
Mathematics 2022, 10, 2066 5 of 13
lowing steps. First, the mapping of input data to feature nodes is established, and then the
enhancement nodes are formed through a nonlinear activation function. Eventually, the
feature nodes and the enhancement nodes are combined as outputs, and the output
the enhancement
weight can be directlynodes
foundare formed
through through a nonlinear
pseudo-inverse activation function. Eventually,
ridge regression.
the feature nodes and the enhancement nodes are combined as outputs, and the output
weight can be directly found through pseudo-inverse ridge regression.
Figure 1. Structure of the BLS network. The Yellow ellipse on the left is the calculation formula for
Figure 1. Structure
converting ofdata
input the BLS
into network. The Yellow
feature nodes, ellipse
the earthy on the
yellow lefton
circle is the
the calculation formula for
left is the generated feature
converting input data into feature nodes, the earthy yellow circle on the left is the generated
nodes, the green rectangle on the right is the calculation formula for converting the left feature fea- nodes
ture nodes, the green rectangle on the right is the calculation formula for converting the left fea-
into enhancement nodes, the light yellow circle on the right is the generated enhancement nodes, and
ture nodes into enhancement nodes, the light yellow circle on the right is the generated enhance-
the blue circle on the top is the output.
ment nodes, and the blue circle on the top is the output.
The BLS is constructed as follows:
The BLS is constructed as follows:
The given data are subjected to a random weight matrix for feature mapping to obtain
a The given
feature data with
matrix are subjected
the aim oftodimensionality
a random weight matrix and
reduction for feature
featuremapping to The
extraction. ob- ith
tainmapped
a featurefeature
matrixis:
with the aim of dimensionality reduction and feature extraction. The
i th mapped feature is: Z = φ ( XW + β ), i = 1, 2, . . . , n (1)
i i ei ei
= Y[ Z=1 ,[ Z. .1.,,…
Zn, Z| H | ξ, .(.Z. n,W
n 1 β nhm1 ), …, ξ ( Z nWhj + β hj )]Wnm
Hh1m+]W
(3)
= =[ Z[ Zn 1| ,H | H1 , …, H m ]Wnm
…m,]ZWn nm
(3)
= =A[m Z n |mH m ]Wnm
n Wn
= AnmWnm
Wnm are the weights that connect the feature nodes and the enhancement nodes to
the output and Wnm = [ Z n | H m ] + Y, where the pseudo-inverse [ Z n | H m ] + can be directly
calculated by ridge regression.
Figure 2. Long short-term cell structure. The blue circle at the bottom is the input data of a cell of the
Figure 2. Long short-term cell structure. The blue circle at the bottom is the input data of a cell of
LSTM. The orange rectangular part is the forget gate, input gate and output gate. The yellow part is
the LSTM. The orange rectangular part is the forget gate, input gate and output gate. The yellow
the calculation formula for generating forget gate, input gate and output gate. The orange circular
part is the calculation formula for generating forget gate, input gate and output gate. The orange
part is the operator for generating forget gate, input gate and output gate.
circular part is the operator for generating forget gate, input gate and output gate.
Specifically, the update process of a cell state can be divided into the following steps:
Specifically,
(1) determine whatthe update
useless process of
information is a cell statefrom
discarded can be
thedivided intoprevious
state of the the following
time steps:
(1)
step;determine
(2) extract what useless
the valid information
information that canisbediscarded from
added to the the
state cellstate ofcurrent
at the the previous
time time
step; (3) calculate the state unit of the current time step; and (4) calculate the output of the
current time step.
Variants of LSTM as part of a prediction model that take full advantage of LSTM in
processing time-series information are also a common approaches in current research.
step; (2) extract the valid information that can be added to the state cell at the current time
step; (3) calculate the state unit of the current time step; and (4) calculate the output of the
current time step.
Variants of LSTM as part of a prediction model that take full advantage of LSTM in
Mathematics 2022, 10, 2066 7 of 13
processing time-series information are also a common approaches in current research.
C-MAPSS
Sub-Dataset
FD001 FD002 FD003 FD004
Training trajectories 100 260 100 249
Testing trajectories 100 259 100 248
Operating conditions 1 6 1 6
Fault modes 1 1 2 2
Each of the four sub-datasets contains a training set and a test set in which the actual
RUL values of the test engine are also included. The training set includes all data from the
start of the turbofan engine’s operation until its degradation and failure. In the test set,
however, the data start from a healthy state and are subsequently arbitrarily truncated; the
operating time periods up to the point of system failure were calculated from these data.
penalty coefficient is larger. Predicting the RUL value earlier allows for earlier maintenance
planning and avoidance of potential losses.
s
2
∑in=1 (d)
RMSE = (12)
n
d
∑n
i =1 e−( 13 ) − 1, d < 0
Score = d
(13)
∑in=1 e−( 10 ) − 1, d ≥ 0
RMSE/Score
Method
FD001 FD002 FD003 FD004
MLP 37.56/18,000 80.03/7,800,000 37.39/17,400 77.37/5,620,000
SVR 20.96/1380 42.0/590,000 21.05/1600 45.35/371,000
CNN 18.45/1290 30.29/13,600 19.82/1600 29.16/7890
LSTM 16.14/338 24.49/4450 16.18/852 28.17/5550
ELM 17.27/523 37.28/498,000 18.47/574 30.96/121,000
BiLSTM 13.65/295 23.18/41,300 13.74/317 24.86/5430
Proposed method 12.45/279 15.36/4250 13.37/356 16.24/5220
SVR 20.96/1380 42.0/590,000 21.05/1600 45.35/371,000
CNN 18.45/1290 30.29/13,600 19.82/1600 29.16/7890
CNN 18.45/1290 30.29/13,600 19.82/1600 29.16/7890
LSTM 16.14/338 24.49/4450 16.18/852 28.17/5550
LSTM 16.14/338 24.49/4450 16.18/852 28.17/5550
ELM 17.27/523 37.28/498,000 18.47/574 30.96/121,000
ELM 17.27/523 37.28/498,000 18.47/574 30.96/121,000
BiLSTM 13.65/295 23.18/41,300 13.74/317 24.86/5430
Mathematics 2022, 10, 2066 BiLSTM 13.65/295 23.18/41,300 13.74/317 24.86/5430
11 of 13
Proposed method 12.45/279 15.36/4250 13.37/356 16.24/5220
Proposed method 12.45/279 15.36/4250 13.37/356 16.24/5220
verify the scientificity and generalizability of the proposed model. In addition, this research
did not pay too much attention to the training time of the model. In later work, we will
focus on further optimizing the model structure to shorten the training time so that the
model can be processed quickly and, at the same time, achieve a satisfactory accuracy.
Author Contributions: Conceptualization, X.W.; writing, manuscript preparation, T.H.; review and
editing, K.Z.; supervision and project management, X.Z. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data of this paper came from the NASA Prognostics Center of
Excellence, and the data acquisition website was: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/
prognostic-data-repository/#turbofan, accessed on 10 February 2022.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Liu, G. A Study on Remaining Useful Life Prediction for Prognostic Applications. Master’s Thesis, University of New Orleans,
New Orleans, LA, USA, 2011.
2. Zio, E.; Di Maio, F. A data-driven fuzzy approach for predicting the remaining useful life in dynamic failure scenarios of a nuclear
system. Reliab. Eng. Syst. Saf. 2010, 95, 49–57. [CrossRef]
3. Heng, A.; Zhang, S.; Tan, A.; Mathew, J. Rotating machinery prognostics: State of the art, challenges and opportunities. Mech.
Syst. Signal Process. 2009, 23, 724–739. [CrossRef]
4. Li, C.J.; Lee, H. Gear fatigue crack prognosis using embedded model, gear dynamic model and fracture mechanics. Mech. Syst.
Signal Process. 2005, 19, 836–846. [CrossRef]
5. Fan, J.; Yung, K.C.; Pecht, M.; Pecht, M. Physics-of-Failure-Based Prognostics and Health Management for High-Power White
Light-Emitting Diode Lighting. IEEE Trans. Device Mater. Reliab. 2011, 11, 407–416. [CrossRef]
6. Daigle, M.; Goebel, K. Model-based prognostics under limited sensing. In Proceedings of the IEEE Aerospace Conference, Big
Sky, MT, USA, 6–13 March 2010; pp. 1–12.
7. Hu, Y.; Baraldi, P.; Di Maio, F.; Zio, E. Online Performance Assessment Method for a Model-Based Prognostic Approach. IEEE
Trans. Reliab. 2015, 65, 1–18.
8. El Mejdoubi, A.; Chaoui, H.; Sabor, J.; Gualous, H. Remaining useful life prognosis of supercapacitors under temperature and
voltage aging conditions. IEEE Trans. Ind. Electron. 2017, 65, 4357–4367. [CrossRef]
9. He, B.; Liu, L.; Zhang, D. Digital twin-driven remaining useful life prediction for gear performance degradation: A review.
J. Comput. Inf. Sci. Eng. 2021, 21, 030801. [CrossRef]
10. Pei, H.; Hu, C.; Si, X.; Zhang, J.; Pang, Z.; Zhang, P. Review of Machine Learning Based Remaining Useful Life Prediction Methods
for Equipment. J. Mech. Eng. 2019, 55, 1–13. [CrossRef]
11. Park, C.; Padgett, W.J. Accelerated degradation models for failure based on geometric Brownian motion and gamma processes.
Lifetime Data Anal. 2005, 11, 511–527. [CrossRef]
12. Chehade, A.A.; Hussein, A.A. A multi-output convolved Gaussian process model for capacity estimation of electric vehicle
li-ion battery cells. In Proceedings of the IEEE Transportation Electrification Conference and Expo (ITEC), Detroit, MI, USA,
19–21 June 2019; pp. 1–4.
13. Van Noortwijk, J.M.; van der Weide, J.A.; Kallen, M.J.; Pandey, M.D. Gamma processes and peaks-over-threshold distributions for
time-dependent reliability. Reliab. Eng. Syst. Saf. 2007, 92, 1651–1658. [CrossRef]
14. Si, X.S.; Wang, W.; Hu, C.H.; Chen, M.Y.; Zhou, D.H. A Wiener-process-based degradation model with a recursive filter algorithm
for remaining useful life estimation. Mech. Syst. Signal Process. 2013, 35, 219–237. [CrossRef]
15. Gebraeel, N.; Lawley, M.; Liu, R.; Parmeshwaran, V. Residual life predictions from vibration-based degradation signals: A neural
network approach. IEEE Trans. Ind. Electron. 2004, 51, 694–700. [CrossRef]
16. Lixin, W.; Zhenhuan, W.; Yudong, F.; Guoan, Y. Remaining life predictions of fan based on time series analysis and BP neural
networks. In Proceedings of the IEEE Information Technology, Networking, Electronic and Automation Control Conference,
Chongqing, China, 20–22 May 2016; pp. 607–611.
17. Liu, Y.; He, B.; Liu, F.; Lu, S.; Zhao, Y.; Zhao, J. Remaining useful life prediction of rolling bearings using PSR, JADE, and extreme
learning machine. Math. Probl. Eng. 2016, 2016, 1–13. [CrossRef]
18. Maior, C.B.S.; das Chagas Moura, M.; Lins, I.D.; Droguett, E.L.; Diniz, H.H.L. Remaining Useful Life Estimation by Empirical
Mode Decomposition and Support Vector Machine. IEEE Lat. Am. Trans. 2016, 14, 4603–4610. [CrossRef]
Mathematics 2022, 10, 2066 13 of 13
19. Wang, X.; Jiang, B.; Ding, S.X.; Lu, N.; Li, Y. Extended Relevance Vector Machine-Based Remaining Useful Life Prediction for DC-Link
Capacitor in High-Speed Train; IEEE: Piscataway Township, NJ, USA, 1963; pp. 1–10.
20. Hinton, G.E.; Osindero, S.; Teh, Y.W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554.
[CrossRef] [PubMed]
21. Shah, S.A.A.; Bennamoun, M.; Boussaid, F. Iterative deep learning for image set based face and object recognition. Neurocomputing
2016, 174, 866–874. [CrossRef]
22. Deng, L. Deep learning: From speech recognition to language and multimodal processing. APSIPA Trans. Signal Inf. Process. 2016,
5, e1. [CrossRef]
23. He, M.; He, D. Deep learning based approach for bearing fault diagnosis. IEEE Trans. Ind. Appl. 2017, 53, 3057–3065. [CrossRef]
24. Ren, L.; Cui, J.; Sun, Y.; Cheng, X. Multi-bearing remaining useful life collaborative prediction: A deep learning approach.
J. Manuf. Syst. 2017, 43, 248–256. [CrossRef]
25. Deutsch, J.; He, D. Using deep learning based approaches for bearing remaining useful life prediction. In Proceedings of the
Annual Conference of the PHM Society 2016, Chengdu, China, 19–21 October 2016.
26. Deutsch, J.; He, M.; He, D. Remaining useful life prediction of hybrid ceramic bearings using an integrated deep learning and
particle filter approach. Appl. Sci. 2017, 7, 649. [CrossRef]
27. Li, X.; Ding, Q.; Sun, J.Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng.
Syst. Saf. 2018, 172, 1–11. [CrossRef]
28. Zhu, J.; Chen, N.; Peng, W. Estimation of bearing remaining useful life based on multiscale convolutional neural network. IEEE
Trans. Ind. Electron. 2018, 66, 3208–3216. [CrossRef]
29. Heimes, F.O. Recurrent neural networks for remaining useful life estimation. In Proceedings of the International Conference on
Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–6.
30. Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks, In International conference on machine
learning. Proc. Mach. Learn. Res. 2013, 23, 1310–1318.
31. Wu, Y.; Yuan, M.; Dong, S.; Lin, L.; Liu, Y. Remaining useful life estimation of engineered systems using vanilla LSTM neural
networks. Neurocomputing 2018, 275, 167–179. [CrossRef]
32. Zhao, C.; Huang, X.; Li, Y.; Li, S. A Novel Cap-LSTM Model for Remaining Useful Life Prediction. IEEE Sens. J. 2021, 21,
23498–23509. [CrossRef]
33. Zhang, H.; Zhang, Q.; Shao, S.; Niu, T.; Yang, X. Attention-based LSTM network for rotatory machine remaining useful life
prediction. IEEE Access 2020, 8, 132188–132199. [CrossRef]
34. Sun, H.; Cao, D.; Zhao, Z.; Kang, X. A hybrid approach to cutting tool remaining useful life prediction based on the Wiener
process. IEEE Trans. Reliab. 2018, 67, 1294–1303. [CrossRef]
35. Chen, C.P.; Liu, Z. Broad Learning System: An Effective and Efficient Incremental Learning System without the Need for Deep
Architecture. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 10–24. [CrossRef]
36. Gong, X.; Zhang, T.; Chen, C.P.; Liu, Z. Research Review for Broad Learning System: Algorithms, Theory, and Applications; IEEE:
Piscataway Township, NJ, USA, 1963; pp. 1–29.
37. Sateesh Babu, G.; Zhao, P.; Li, X.L. Deep convolutional neural network based regression approach for estimation of remaining
useful life. In International Conference on Database Systems for Advanced Applications; Springer: Berlin/Heidelberg, Germany, 2016;
pp. 214–228.
38. Zheng, S.; Ristovski, K.; Farahat, A.; Gupta, C. Long short-term memory network for remaining useful life estimation.
In Proceedings of the 2017 IEEE International Conference on Prognostics and Health Management (ICPHM), Dallas, TX, USA,
19–21 June 2017; pp. 88–95.
39. Zhang, C.; Lim, P.; Qin, A.K.; Tan, K.C. Multiobjective deep belief networks ensemble for remaining useful life estimation in
prognostics. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2306–2318. [CrossRef]
40. Wang, J.; Wen, G.; Yang, S.; Liu, Y. Remaining useful life estimation in prognostics using deep bidirectional lstm neural network.
In Proceedings of the 2018 Prognostics and System Health Management Conference (PHM-Chongqing), Chongqing, China,
26–28 October 2018.