0% found this document useful (0 votes)

9 views33 pages

Fault Diagnosis in Electric Motors Using Multi-Mode Time Series and Ensemble Transformers Network

This paper introduces a novel fault diagnosis approach for induction motors using a multi-modal dataset and an ensemble Transformer network optimized by a chaotic whale optimization algorithm. The method integrates various signals, achieving a diagnostic accuracy of 99.10%, outperforming traditional single-mode data techniques. The study emphasizes the importance of multi-modal data fusion in enhancing the reliability and efficiency of motor fault diagnosis.

Uploaded by

Huy Phan Duc Nhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views33 pages

Fault Diagnosis in Electric Motors Using Multi-Mode Time Series and Ensemble Transformers Network

Uploaded by

Huy Phan Duc Nhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

www.nature.

com/scientificreports

OPEN Fault diagnosis in electric motors

using multi-mode time series and
ensemble transformers network
Bo Xu1, Huipeng Li1,2, Ruchun Ding1 & Fengxing Zhou2
Induction motors are essential in industrial production, and their fault diagnosis is vital for ensuring
continuous and efficient equipment operation. Minimizing downtime losses and optimizing
maintenance costs are key to maintaining smooth production and enhancing economic efficiency.
This paper presents a novel diagnostic approach for diverse motor faults, integrating time series
analysis, Transformer-based networks, and multi-modal data fusion. Firstly, multiple signals such as
three-phase current, vibration, device sound, and ambient sound are collected to form a multi-modal
dataset. Subsequently, a Transformer network for single time series classification is developed, and
multiple instances are concatenated in parallel to create an ensemble Transformer network. The self-
attention mechanism is then utilized to dynamically integrate features from different modal data for
accurate motor fault identification. During network training, the chaotic WOA optimizes the ensemble
Transformer network’s hyper-parameters. Finally, the proposed method is trained and tested on a
motor measurement multi-modal dataset. Experimental results show that it performs outstandingly
on multi-modal datasets, attaining a high diagnostic accuracy of 99.10%. Compared with single-mode
data and state-of-the-art methods, it demonstrates superior diagnostic accuracy and reliability.

Keywords Induction motor, Fault diagnosis, Ensemble transformers network, Chaotic whale optimization
algorithm, Multi-mode

As a common industrial driving device, the reliability and stability of electric motors are crucial for production.
But due to long-term use and external factors, motors may experience various faults like bearing issues,
rotor fractures, and stator short circuits1. Fault features are diverse and have strong nonlinear relationships2.
Consequently, extracting these features is tough, requiring much data and causing high algorithm complexity,
which often leads to lower fault type recognition accuracy. Currently, many scholars have proposed numerous
motor fault diagnosis methods. Chen, et al.3 in 2019 presented an intelligent induction motor fault diagnosis
system using a CMAC. It diagnoses faults fast and accurately via vibrational signal spectrum analysis. But in
practical work settings, it may be unstable and vulnerable to external factors, reducing diagnosis result confidence.
Gyftakis introduced an innovative methodology for detecting broken bar faults in induction motors through the
monitoring of electromagnetic torque4. The authors first analytically evaluated the harmonic component of the
electromagnetic torque in a normal induction motor, considering saturation, stator and rotor magnetic flux
harmonics, and slot effects. Then, they used finite element analysis to study the effect of broken rod faults on the
spatial electromagnetic properties of induction motors by creating and simulating healthy and faulty models.
This analysis revealed higher-order harmonic components in the electromagnetic torque spectrum related to
broken bar faults. Also, they checked the applicability of the proposed fault characteristics in different induction
motors and verified the finite element results with experiments. But diagnosis models based on finite element
simulations have weak generalization. Gao et al.5 discussed PMSM inter-turn short-circuit fault and proposed
a current/voltage-based detection strategy. But current/voltage features are affected by not only the fault but
also other interferences, potentially harming fault diagnosis accuracy and reliability. Contreres-hernandez et al.
employed a signal geometric analysis method to infer multiple fault characteristics for induction machines6. This
approach represents signal data as points in a multidimensional space. Geometric analysis of these point sets
yields signal characteristics like fault type and location. But in geometric analysis, different fault features may
overlap, increasing fault diagnosis complexity. Atta et al., published a review article discussing fault detection and
diagnosis technologies for induction motors and driver interrupt strips7. This paper describes the latest research
advances and trends in induction motor and driver fault detection and diagnosis, providing valuable insights for

1School of Physics and Electronic Information, Huanggang Normal University, Huanggang 438000, China. 2School
of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 438001, China.
email: yeshuip@163.com

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 1

www.nature.com/scientificreports/

related research and applications. Zeng et al. proposed an online rotor fault diagnosis method based on stator
tooth flux8. This approach detects rotor faults by stator flux shifts. Fault diagnosis is done with a constructed
model and Kalman filter. But stator flux is affected by motor state and load, adding noise and making fault feature
extraction hard. Contreras-Hernandez et al., proposed a quaternion signal analysis algorithm for the detection
of induction motor faults9. However, this algorithm has practical limitations. It needs much input current and
output voltage data, requiring a high-performance acquisition system. Also, its accuracy in feature extraction
and fault detection depends on parameter selection and model construction precision. Zhou et al., proposed a
motor torque fault diagnosis strategy for four-wheel independent motor-driven vehicles using the unscented
Kalman filter10. It diagnoses motor torque faults accurately via collecting vehicle sensor data and estimating/
predicting motor torque with the filter. But accurate diagnosis needs real-time, high-quality data acquisition and
transmission, which may be affected by sensor precision, signal transmission delays and interference in practice.
Martin-Diaz et al., proposed a strategy to evaluate the performance of various machine learning techniques for
motor fault diagnosis, offering a valuable reference for practical application scenarios11. However, due to the
limitations of the experimental conditions, these results require further exploration and validation.
Recently, deep learning has made significant progress in the field of motor fault diagnosis. Such as, Omer,
Kullu et al. utilized convolutional neural networks (CNNs) and Long Short-Term Memory networks (LSTMs)
to process sensor data12. A CNN extracts spatial features from sensor data and an LSTM captures time-series
features. Integrating them utilizes both spatial and temporal info. Lang et al. discussed AI in EV motor fault
detection, like neural nets, fuzzy logic and genetic algorithm13. Liu et al., proposed a multi-scale kernel-based
residual convolutional neural network (MKRCNN) to address the issue of motor fault diagnosis under non-
stationary conditions14. In this method, wavelet packet transform decomposes non-stationary motor vibration
signals into sub-band signals of different scales. Then, each sub-band signal is processed by RCNNs to extract
features and learn high-level representations. Wang15 proposed a lightweight multi-sensor fusion model for
induction motor fault diagnosis. The method usually focuses on single-mode data like vibration, sound and
current. A CNN processes visual data and an LSTM processes auditory data. Integrating their outputs gives
more accurate diagnosis results. Kang et al.16 proposed an edge-based real-time motor fault diagnosis solution
using an efficient CNN, achieving real-time and low-latency diagnosis by deploying models on edge devices.
Sun et al.17 proposed a deep learning technology for induction motor fault identification and classification. It
uses sparse autoencoders to learn low-dimensional feature representations from high-dimensional data. The
learned features train a deep neural network for accurate fault classification, leveraging the autoencoder’s feature
extraction and dimensionality reduction strengths for efficient motor fault discrimination. Hoang18 proposed
a bearing fault diagnosis method based on motor current signals. The sensor captures the current signal
during motor operation, then a CNN extracts features and classifies the fault. Information fusion technology
is integrated to ensure accuracy by combining the current signal with other relevant data like vibration and
temperature signals. Ribeiro19 proposed a method for motor fault detection and diagnosis using multi-channel
vibration signals and 1D CNNs. It starts by collecting motor vibration signals via accelerometers, converting
them into multi-channel data. Then a 1D CNN trains on the data to automatically extract features and learn
to distinguish normal and faulty states. Attestog20 proposed a robust active learning method for multi-fault
diagnosis, which can accurately diagnose induction motor drive systems under dynamic conditions and with
unbalanced datasets. Kao21 proposed two feature extraction methods for PMSM fault diagnosis: one based on
wavelet packet transform and the other a deep 1D CNN with softmax layers. Both can accurately diagnose
PMSM faults of different severities at various operating velocities. Li22 proposed a method using deep CNNs
and image recognition for diagnosing rotor magnetization and eccentricity faults in PMSMs. The deep CNNs
analyze motor images to detect such issues, providing high accuracy and reliability for motor fault diagnosis
and maintenance. Zhang et al., proposed an inferential deep distillation attention network for detecting multiple
motor bearing faults23. In this network, a CNN first extracts features from original vibration signals. Then an
attention mechanism makes the model focus on key features. Finally, knowledge distillation techniques transfer
its knowledge to a smaller, more efficient model for rapid inference and diagnosis. Chen24 proposed a capsule
network-based data-driven method for smart motor multiple fault detections. Capsule networks can capture
image spatial hierarchies better than CNNs and are used to extract features for more accurate detection. But
it’s a “black box” with an unexplainable decision-making process, challenging its practical application. Also,
the study relies mainly on simulated data while real data may have noise and anomalies that can affect the
model’s generalization ability. Principi25 proposed an unsupervised motor fault detection strategy with deep
autoencoders. It captures motor’s normal vibrational patterns and identifies fault signals via unsupervised
learning. Not relying on labeled fault data, it’s more flexible and convenient than traditional supervised methods.
Wang26 applied a CCNN for motor fault diagnosis under non-stationary conditions. The method uses progressive
optimization techniques to adjust network parameters during training for different operating conditions and
fault types. Stepwise fine-tuning helps the CCNN adapt to non-stationary motor signals and improve diagnosis
accuracy. Khanjani27 proposed a deep learning method for heatmap feature extraction in three-phase induction
motor electrical fault detection. They introduced heatmap obtaining, detailed network structure and training.
Multi-layer CNNs extract features from heatmaps and a fully connected layer does fault classification. It offers an
innovation for motor fault detection and shows deep learning’s industrial application potential. Jang28 proposed
an advanced method combining vibration data processing and deep learning preprocessing for high-precision
motor fault diagnosis. Wavelet packet decomposition extracts frequency features, adaptive filters augment
them, RNNs model feature sequences, and an SVM classifier diagnoses faults using the preprocessed data. The
integrated strategy improves diagnosis accuracy. Zhu29 proposed an intelligent edge system for real-time motor
rotor quality detection. Using edge computing and AI, it monitors and detects issues during production via
sensors and cameras. Delegating inspection tasks to edge devices cuts costs and latency, enhancing production
efficiency and quality in the motor manufacturing industry.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 2

www.nature.com/scientificreports/

However, deep confidence network (DBN) as an unsupervised method has low data requirements and strong
feature extraction but its performance is hard to ensure. CNNs have local perception and other advantages yet
have many parameters, limited input sizes, lack global info and need much data. CRNN is good at processing
sequence data but has gradient and long-term dependence issues. LSTM solves some problems of traditional
CRNN with a gating mechanism but still faces challenges like many parameters and high complexity. RCNNs
address the vanishing gradient issue via residual connections to improve training and modeling. Nonetheless,
their large parameter number and data requirements may raise computational and storage costs and need
regularization to avoid overfitting. Also, previous studies rely on single-modal data, which has problems
like incomplete info, feature redundancy and poor robustness in fault diagnosis. So, multi-modal data fault
diagnosis methods are now a research focus. For example, Xu30 elaborated the background and motivation of
multimodal learning, emphasizing that different types of information in multimodal data are complementary
and valuable, enhancing machine learning tasks. A multi-modal data enhancement framework31 was proposed
to boost emotion classification task performance. It enriches training data via multiple augmentation techniques
and enhanced multi-modal inputs. Dai32 discussed multi-modal data fusion as a research method from the
perspective of information theory, aiming to improving the integration and utilization efficiency of multiple
different modal data. Mu33 discussed the challenges faced by multi-modal data in the field of learning analysis
and the future research direction. Qi34 proposed a multi-modal sentiment analysis method utilizing a multimodal
coding-decoding network with a structure similar to Transformer. Pawlowski’s35 focuses on comparing effective
multi-modal data fusion techniques. The authors summarize them and do an in-depth comparison. The study
suggests exploring combinations of different techniques for better multi-modal data fusion effectiveness and
performance in the future. Ma36 proposed a method using deep-coupled auto-encoders for multi-modal sensing
data to diagnose rotating machinery faults. It fuses multiple data modalities and is significant in multi-modal data
fusion and fault diagnosis fields. Li et al.37 expounded on multi-modal data’s significance and application fields,
discussed common factor analysis methods in its integration analysis, and emphasized integrating multiple data
modalities to improve machine learning tasks. Sleeman introduced an innovative classification system to address
challenges in harmonizing terminology and framework descriptions for multi-modal classification models38.
Despite advances, multi-modal data’s application in motor fault diagnosis is still limited, highlighting the urgent
need and potential value of integrating it more extensively in motor fault detection.
Based on analysis, this paper proposes a joint time-series ensemble Transformer network and multi-
modal data fusion method for motor fault diagnosis. To optimize the ensemble Transformer network’s
hyperparameters, the whale optimization algorithm (WOA) is introduced as it’s effective in finding global or
near-optimal solutions39,40. However, the traditional WOA is prone to local optimization and has an imbalance
between global search and local exploitation. So, this paper improves it by introducing a modified 3D logistic-
sinusoidal complex chaotic map to enhance its global search and convergence41. The modified WOA is used to
optimize the ensemble Transformer network’s hyper-parameters for better classification performance in motor
fault identification. The proposed approach has the following main innovations.

(1) Various sensors are utilized to collect three-phase current signals, three-axis vibration signals, two-dimen-
sional sound signals, and one-dimensional ambient noise signals. These multi-modal time-series signals,
obtained from different sensors, will serve directly as training and test datasets for the diagnostic model.
This approach ensures that the model can fully capture information about the motor’s operational status
and potential failures.
(2) A parallel Transformer network based on time series has been constructed, specifically designed for the
ensemble Transformer network. This network is tailored for training and testing multi-modal motor fault
data, aiming to achieve robust and accurate classification results.
(3) The WOA was introduced with the incorporation of 3D Chaotic composite maps to enhance its global
search ability and convergence rate. This algorithm has also been utilized to optimize the hyperparameters
of the ensemble Transformer network, thereby achieving optimal classification performance.

The remainder of the paper is organized as follows. Section “Basic methods” describes the basic approach;
Section “Proposed methods” describes the proposed method. In Section “Example verification”, the proposed
method is experimentally validated. Section “The E-transformer network, optimized with the chaotic WOA, is
trained and tested.” summarizes and discusses the proposed method.

Basic methods
In this chapter, three notable techniques are briefly introduced, namely Transformer networks, the whale
optimization algorithm, and Chaotic mapping. By understanding the fundamentals of these techniques, we can
better appreciate their essential role in problem solving.

Transformer network
Architectures based on self-attention, especially Transformers42, have gradually become the preferred model
in the field of natural language processing (NLP). Similar to the structure of the traditional seq2seq model,
Transformer also consists of an encoder and a decoder. Specifically, the structure of the Transformer model
is shown in (Fig. 1). In Fig. 1, each encoder module comprises multiple independent attention modules,
feedforward networks, residual connections and layer normalization modules, which together construct
a complete coding system for efficient information transmission and processing. To enhance the model
performance, the Transformer introduces residual connections around each module and integrates them into a
single layer-normalized module. The aim of this move is to prevent model performance degradation and ensure
excellent training results even when the model has a large number of layers. With the normalization layer, data

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 3

www.nature.com/scientificreports/

Fig. 1. Algorithm structure diagram for the transformers.

feature distribution can be effectively normalized, allowing the Softmax activation function to perform more
effectively and thereby improving stability.
Moreover, the multi-head self-attention module consists of multiple self-attention modules which play a
crucial role in capturing the relationships between the currently generated sequence and the previously generated
text. This module projects the matrices representing the query key (Q), key (K), and value (V) through a number
of different linear transformations, and then concatenates the various self-attention results as follows:

MultiHead(Q,K, V ) = Concat(head1 , . . . ,headh )W O (1)

Among,

i ,KWi ,V Wi )(2)
headi = Attention(QWQ k V

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 4

www.nature.com/scientificreports/

where,WQ i ∈ R
dmodel ×dk
, WKi ∈ R
dmodel ×dk
, WVi ∈ Rdmodel ×dv and W O ∈ Rhdv ×dmodel are the parameter
matrix. Moreover, the self-attention is computed using the scaled dot product, i.e.
( )
QK T
Attention(Q, K, V ) = sof tmax √ V (3)
dk

Figure 2 illustrates the structure of the scaled dot-product attention and multi-head attention modules. The
feedforward network maps the vector obtained from the multi-head attention mechanism to the desired
dimension, primarily consisting of two linear transformations with a Softmax activation function in between.
“Its operation is as follows:
FFN(x) = sof tmax(0, xW1 + b1 )W2 + b2 (4)

While the linear transformation rules are consistent across different locations, each layer employs distinct
parameter settings. Since each state in the sequence is updated independently, this is analogous to performing
a 1 × 1 convolution operation. Although the decoder and encoder have similar structures, there are some
differences between them. First, the decoder module comprises two multi-head attention layers. The initial
multi-head attention layer masks the input data, while the second one utilizes the encoded information matrix
output by the encoder when computing the query and value matrices. Concurrently, the query matrix is derived
from the output of the preceding decoder module. Additionally, the encoder employs a Softmax layer to compute
the probability of the next predicted observation point before outputting. This architecture enables the model to
more effectively understand and process sequential data, leading to enhanced performance in natural language
processing and other related tasks.

Whale optimization algorithm

WOA43 demonstrates excellent optimization capabilities by simulating the hunting behavior of humpback
whales, particularly their bubble net attacks. The bubble net attack is a specialized hunting technique employed
by humpback whales, which involves surrounding and capturing prey by emitting bubbles along a spiral path.
This process is accurately simulated by the WOA algorithm shown in (Fig. 3). WOA consists of three phases:
besieging prey, bubble net attack (exploitation), and searching for prey (exploration), which are both scientific
and efficient. Moreover, the WOA is advantageous because it is simple to implement, has few control parameters,

Fig. 2. Structure of scaled dot product attention and multi-head attention modules. (a) Scaled dot product
attention; (b) Multi-head attention module composed of attention layers.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 5

www.nature.com/scientificreports/

Fig. 3. Spiral bubble net foraging strategy of the humpback whale.

and is less likely to become trapped in local optima, These attributes have made it a focal point of interest in the
field of swarm intelligence optimization algorithms and a current research hot-spot.

Surround the prey

In the initial phase of the hunt, since the location of the optimal individual in the search space is not yet known,
the WOA treats the current best candidate solution as the target prey. Once the best search agent is identified, the
other search agents update their positions based on the coordinate of the best search agent. The specific position
update expression is as follows.
{
D = |C · X ∗ (t) − X(t)|
(5)
X(t + 1) = X ∗ (t) − A · D

where, t is the current iteration number.X ∗ (t) is the position vector of the current best solution.X(t) is the
position vector of the individual whale. D is the distance between an individual whale and its prey. A and C are
coefficient vectors that are used to control the swimming pattern of the whale and have the following properties:

A = 2a · r1 − a

C = 2 · r2
(6)
a = 2 − 2t

Tmax

where, r1 , r2 ∈ [0, 1][0, 1], a is the control parameter. t is the number of current iterations. Tmax denotes the
maximum number of iterations.

Hunting stage
During the hunting phase, the WOA simulates patterns of whale hunting behavior, searching for potential
solutions within the enclosed area. This behavioral pattern helps the algorithm to approach the optimal solution
more rapidly and to avoid, to some extent, becoming trapped in local optima. The mathematical model is as
follows:

X(t + 1) = D · ebl · cos(2πl) + X ∗ (t)(7)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 6

www.nature.com/scientificreports/

where, b is a constant and is used to define the logarithmic spiral shape. l ∈ [−1,1] is random number. At the
same time, p is used to judge which way to update the position, p ∈ [0,1], its mathematical model is as follows:
{
X ∗ (t) − A · D p < 0.5
X(t + 1) = (8)
D · ebt · cos(2πl) + X ∗ (t) p ≥ 0.5

Search for prey

By simulating the behavior of a whale searching for prey, the algorithm employs a global search strategy to
escape local optima and explore a broader solution space. The mathematical model of this process is as follows:
{
D = |C · X rand (t) − X(t)|
(9)
X(t + 1) = Xrand (t) − A · D

where, Xrand is a randomly generated position vector of a whale individual.

Chaotic mapping
Chaotic mapping44 is characterized by sensitivity to initial values, inherent randomness, boundedness, ergodicity,
and unpredictability. In the field of intelligent optimization, it serves as an alternative to pseudo-random number
generators, often achieving superior results45. Common chaotic mappings include the Logistic mapping46,
Sine mapping47, Tent mapping48, Chebyshev mapping49, ICMIC mapping50, Cubic mapping51, among others.
However, due to the simple structure and chaotic orbit of the one-dimensional chaotic mapping mentioned
above, their ergodic and stochastic performance is relatively limited52. Furthermore, the performance of these
chaotic mapping can be enhanced through cross-composition and the expansion into higher dimensions53.

Proposed methods
This section details the motor fault diagnosis method for the multi-modal time series based on ensemble
Transformer networks. The overall structure, as shown in Fig. 4, mainly comprises a multi-modal data
acquisition module, an ensemble Transformer network, classifier, etc. It is particularly crucial to note that the
ensemble Transformer network is a parallel ensemble of multiple time-series Transformer networks. Moreover,
the performance of each time series Transformer network is influenced by its hype-parameters. Therefore, in
order to enhance the network performance, a modified WOA algorithm is proposed in this section to identify
the optimal hyper-parameters.

Ensemble transformer network

Multi-modal time series possess diverse data structures and expressions, and a single Transformers network
cannot comprehensively extract their features. Therefore, in this section, an innovative ensemble Transformer
network is proposed, which assigns specialized Transformer networks to different modalities within the multi-
modal time series, creating a parallel ensemble of multiple Transformer networks. Figure 5 illustrates the overall
architecture of the ensemble Transformer network. In Fig. 5a, the entire network is composed of multiple time
series Transformer networks, Add & Norm modules and classifiers. Figure 5b provides a detailed view of the

Fig. 4. Frame diagram of the proposed motor fault diagnosis algorithm.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 7

www.nature.com/scientificreports/

Fig. 5. Transformers network structure. (a) Consolidated Transformers network. (b) Transformers network
with a single time series.

network structure for a single time series Transformer, which is designed to fully consider the characteristics of
time series data and can effectively extract useful information from complex datasets.
The Multi-modal data acquisition module integrates multiple sensors and data acquisition techniques for
simultaneous collection of different types of data, including sound, vibration, current, and other diverse data
modes.
The time-series Transformer networkdecomposes multidimensional time series into small, fixed-size
patches, then linearly embeds them, adds positional embeddings, and feeds the resulting sequence of vectors
into a standard Transformer encoder. For classification purposes, the standard approach is utilized here, which
involves adding an additional learnable component, termed a 'classification token,' to the sequence. The detailed
structure is as follows:
(1) Token embedding. The standard Transformer network receives one-dimensional sequences that are
embedded as tokens. However, the time series from different modalities within a multi-modal time series
can be multidimensional, which is the case for time series data x ∈ RM ×N ×C . Here, the M and N denote the
length and dimension of the time series, respectively, and C represents the number of channels. Subsequently,
the
[ 1 data is flattened
] to xM∈ R N ×C . Moreover, it is divided along the time axis into A-block sequences
M ×(N ×C)
2
x ,x ,··· ,x M
and x ∈ R . A constant latent vector of size D is used in all layers of the Transformer.
Consequently, the block sequence is augmented and mapped to dimension D by a trainable linear projection.
The output of this projection is referred to as the Token embedding.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 8

www.nature.com/scientificreports/

Algorithm 1. Training an empirical transformer network.

0
(2) Class [ (z0 = xclass ) to the block]
[ token. Similar] to BERT’s [class] token, adding a learnable embed
sequence x1 , x2 , · · · , xM results in a new embedded block sequence z0 = xclass ; x1 E; x2 E; · · · ; xN E

, where E ∈ R(N ×C)×D C.

(3) Position embeddings. To preserve location information, position embeddings are added to the block
sequence. Here, the standard learnable 1-D position embedding method is employed, and z0 is extended to

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 9

www.nature.com/scientificreports/

[ ]
obtain z0 = xclass ; x1 E; x2 E; · · · ; xN E + E positon , where Epostion ∈ R(N +1)×D . The resulting sequence
of embedding vectors is then used as input to the encoder.
(4) Encoder. This section still utilizes the Transformer encoder, which is consists of N Transformer blocks
and takes as input the embedding sequence z0. Each Transformer block includes a Multi-Headed Self-Attention
layer, an Add & Norm layer, and a Feed-Forward Neural Network layer. This can be expressed by the following
formula:

z′l = LayerNorm(M SA(zl−1 ) + zl−1 ) l = 1, 2, · · · , N (10)

zl = LayerNorm(F F N (zl′ ) + zl′ ) l = 1, 2, · · · , N (11)

where, MSA(•) is a Multi-Headed Self-Attention layer; FFN(•) is a Feed-Forward Network layer. Moreover,
due to the multi-modal time series, each time series data has its own range of features. Each encoder outputs a
hidden state zcl with a distinct range of values, and this diversity makes subsequent modules unstable. Therefore,
a context normalization model is introduced, which is defined as follows.
zlc − mean(βn zlc )
z̃cl = (12)
std(βn zlc )

where, β is a hyper-parameter that determines the weight of zcl .

The Self-Attention layer can dynamically focus on specific modalities or points in time based on the importance of
each mode at different times, which enables it to more effectively capture relevant information across modalities.
However, self-attention has time and space complexity that is proportional to the sequence length, making it
inefficient for lengthy sequences. To enhance the efficiency of ensemble Transformer network, Learning-To-Hash
Attention (LHA) is employed here as a replacement for self-attention. The core idea is to separate parameterized
hash functions for query and key learning, allowing sparse pattern in LHA to adapt beyond distance-based hash
functions such as (Locality Sensitive Hashing) LSH or online k-means, and to better align the mismatch between
query and key distribution. The Sparse attention is defined as follows.
∑
H̃ = Sparse − Attention(Qi , K, V ) = Aij Vj
(13)
j:hQ (Qi )=hK (Kj)

where, hK , hQ : Rdh → [B] is the hash function of the key and query and B is the hash bucket. Aij ∝ Aij
∑
is the attention weight satisfying ∀i and A = 1. By defining a parametrized function
j:h (Qi )=h (Kj ) ij
Q K
HK , HQ : Rdh → RB , LHA implements a learnable hash function hK , hQ : Rdh → [B], as follows.
hQ (Qi ) = argmax [HQ (Qi )]b (14)
b∈{1,2,··· ,B}

hK = argmax [HK (Kj )]b (15)

b∈{1,2,··· ,B}

where,HQ and HK are arbitrary parameterized functions. Finally, a single linear layer is applied to obtain the
hidden state H̃ .

The Add & Norm layer adds the hidden state output from the attention layer to the input, thereby preserving the
original input information within the network and preventing information loss. The subsequent normalization
step normalizes the results of this addition, aiding the network in converging more smoothly during training and
enhancing the model’s generalization performance.
The Classifier constitutes the bottom layer of the ensemble Transformer network. Firstly, each time series
Transformer is employed to encode and generate a feature representation of the input data. These features
encapsulate contextual information about the input data. Secondly, the classifier tmaps the features from the
output of the additive-forward multi-layer perceptron (MLP) through the activation function, specifically
Softmax, to the probability distribution of the class categories, as follows:
0 0
( ( ) )
Class(zN ) = sof tmax GeLU zN W1 + b1 W2 + b2 (16)
√2
where, GeLU(x) = 0.5x(1 + tanh( π
(x + 0.044715 x3 ))). It is a Gaussian error linear element that
preserves both the linear and nonlinear parts and has the property of continuous differentiability. Compared
with ReLU, GeLU has a nonzero gradient on the negative interval, which helps to better transfer the gradient
and reduce the “dead neuron” problem during training. z0N is the hidden layer feature and is used as the output
of the encoder for subsequent classification.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 10

www.nature.com/scientificreports/

The Model training process of the ensemble Transformer network adheres the standard Transformer network
training process, which includes data preprocessing, input embedding, Transformer architecture, self-attention
mechanism, attention centroids and masks, forward propagation, loss computation, back-propagation and
parameter update, training iterations, testing, and inference. The detailed training procedure is described as
follows.

3D chaotic composite maps

Numerous smart optimization algorithms employ random number strategies to some extent. It is worth noting
that the essence of intelligence lies in randomness, and the degree of randomness determines the level of
intelligence54. According to Section “Chaotic mapping”, one-dimensional chaotic maps can be used to improve
chaotic properties through cross-composition and higher-dimensional extensions. Hua55 described the two-
dimensional Logistics-sine complex chaotic mapping, whose Lyapunov index increased compared with that of
Logistic mapping and Sine mapping. However, the distribution of chaotic attractors remains inhomogeneous and
the Lyapunov exponent is relatively tiny. Huang, H. Q., et al.56, researchers proposed two-dimensional Logistics-
sine-cosine complex chaotic mapping, whose Lyapunov index increased compared with two-dimensional
logistics-sine mapping, but the complexity of chaotic orbit was relatively low. Gu et al.57 proposed 3D Cat mapping,
which has a relatively tiny Lyapunov index despite its relatively complex chaotic orbital structure. Tang et al.58
demonstrated an improved method of three-dimensional Logistics-sine cascade complex chaotic mapping, but
its Lyapunov index was still relatively tiny. Sathiyamurthi et al.59 analyzed 3D Lorenz-logistic complex chaotic
mapping in detail, which significantly enhanced the nonlinear characteristics of the chaotic mapping, However,
it should be pointed out that the complexity of its chaotic orbit needs improvement. Furthermore, the Lyapunov
exponent of the composite chaotic map may decrease or even result in the loss of chaotic behavior. Therefore, in
order to obtain composite maps with excellent chaotic properties, this paper introduces a modified 3D logistic-
sinusoidal composite chaotic map whose mathematical expression is as follows.
 [ ]
 sin(πk · yn (1 − yn ))

x = k · z (1 − z ) + (4 − k) mod 1
4
n n n



 [ ]

sin(πk · zn (1 − zn ))
yn = k · xn (1 − xn ) + (4 − k) mod 1 (17)

 4

 [ ]

 sin(πk · xn (1 − xn ))

zn = k · yn (1 − yn ) + (4 − k) mod 1
4

where, k is the chaotic control parameter and k ∈ (0, 4).xn , yn and zn ∈ [0, 1].
 xn

xn+1 =

 y n · zn
 yn
yn+1 = (18)
 xn · zn


zn+1 = zn

yn · xn

Next, the chaotic orbits and Lyapunov exponents of the modified 3D logistic-sine complex map are plotted
to verify the chaotic nature of the map. As shown in Fig. 6, it is observed that the chaotic orbit in (Fig. 6a)
is uniformly spread over the entire space. Concurrently, the Lyapunov exponent displayed in (Fig. 6b) also
obtained a large value. Consequently, these two indicators confirm that the composite chaotic map possesses
superior chaotic properties.

Chaotic whale optimization algorithm

The chaotic nature of the map aids in preventing the WOA from becoming trapped in local optima, thereby
enhancing the efficiency of global optimal solution searches and accelerating the convergence rate. In Section
“3D chaotic composite maps”, detailed analysis of the modified 3D logistic-sine complex map is presented and
demonstrating its enhanced chaotic properties. Consequently, in this section, we employ this map to enhance
the search efficiency and expedite the convergence of the WOA. The specific implementation steps are as follows.

Parameter initialization
(1) Initialize the task dimension D. The population size of the WOA is ND . Maximum number of iterations is T.
Boundary constraints for different tasks is BD ∈ [b1 , b2 ]D .
(2) Initialize the modified control parameter c of 3D Logistics-sine complex mapping. Variables x1 = rand(·),
x2 = rand(·), x3 = rand(·), where rand(·) is a uniformly distributed random function and rand(·) ∈ (0, 1).
(3) Initialize the initial position XiD of each whale in the populations, as follows.
[ ]
XiD = X1D , X2D , · · · , XN
D
(19)
√
( )2 ( )2 ( )2
x3D + yi,D
3D
+ zi,D
3D
(20)
=
i,D
CRandn
3
D

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 11

www.nature.com/scientificreports/

Fig. 6. Chaotic orbits and Lyapunov exponents for the modified 3D logistic-sine composite map. (a) Chaotic
orbits. (b) Lyapunov index.

[ ]
XiD = X1D = CRand1D , X2D = CRand2D , · · · , XN
D
= CRandN
D (21)

where, CRand is the spatial distance between the locus point of the chaotic composite mapping and the origin
(0, 0, 0), and CRand ∈ [0, 1]. Using CRand to initialize the location of WOA can enhance the diversity of the
population.

(4) Initialize the random vector r1, r2, p of WOA, and r1, r2, p = CRand.

Hunting process
Surround the prey In standard WOA, the parameter vector A in Eq. (6) determines the search capability of
WOA. However, adding a random component, denoted as CRand, can enhance the nonlinear complexity of A.
Yet, the control parameter A in a is linear, which can limit the search capability of WOA to some extent. There-
fore, a nonlinear parameter a is adopted here, as follows.
( )
t π
a = 2 − 2 × sin × + φ (22)
Tmax 2

where, t is the number of current iterations. Tmax is the maximum number of iterations. φ is a random
disturbance, and φ ∈ (0, 1).

Furthermore, the coefficient vectors A and C in the position update equation dictate how the whale swims as
they circle their prey. To enhance their stochasticity, the strategy involving chaotic composite maps is employed
in this section, as follows.
 ( )
t π 3D

a = 2 − 2 × sin × + X
 2
i,D
Tmax
{ 3D (23)
 A = 2 × a × Yi,D − a

 3D
C = 2 × Zi,D

where, Xi3D ,Yi3D ,Zi3D are the variables of the modified 3D Logistics-sine complex mapping, and they all belong
to [0, 1].

Hunting stage At this stage, the WOA employs a helical update location mechanism, which has demonstrated
excellent results in various applications. However, the spiral update mechanism has extended from 2 to 3D based
on whale hunting behavior. This extension is consistent with the natural hunting patterns of whales. Therefore,
a log-spiral update rule for 3D is proposed, and the procedure is as follows.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 12

www.nature.com/scientificreports/


XSpiral = e × cos (2πl)
bl

YSpiral = ebl × sin (2πl) (24)


ZSpiral = ebl

where, b is a constant used to define the shape of the logarithmic spiral . l ∈ [−1,1] random number, 3D spiral curve
is shown in (Fig. 7). In Fig. 7, the space point M(0, 0, 0) spirals up to the space point M(Xspiral , Yspiral , Zspiral
), and the distance between two points in the space is:
√
∥Mdistance ∥22 = 2
Xspiral + Yspiral
2
+ Zspiral
2 (25)

Then, the spiral update mechanism of standard WOA is enhanced using Eq. (25), as follows:

X(t + 1) = D × l × ∥Mdistance ∥22 + X ∗ (t), p ≥ 0.5(26)

Search for prey When |A|≥ 1, the whale population is updated based on a whale randomly selected according
to Eq. (9), this strategy helps to avoid falling into local optimal solutions to some extent. However, if the early
search deviates significantly from the target value, the later search may become trapped in the local optimum.
To further reduce the likelihood of the WOA getting stuck in local optima, the Cauchy variational strategy is
introduced at this stage, and the mathematical model is presented below.
{
DRand = Cauchy ⊕ |C · X CRand (t) − X(t)|
(27)
X(t + 1) = Cauchy ⊕ XCRand (t) − A · DRand

where, Cauchy is the Cauchy operator and the expression for the probability density function of the standard
Cauchy distribution in one dimension is given as follows.
( )
1 1
f (x) = , −∞ < x < +∞(28)
π x2 + 1

As can be seen from Eq. (28), the Cauchy operator has an extended “tail”, which gives individuals a higher
probability of jumping to better positions and escaping local optima due to its longer distribution at both ends.
Additionally, a smaller central peak suggests that the Cauchy operator focuses less on exhaustively searching
the domain space. Therefore, the introduction of the Cauchy operator, which mutates individual positions to
generate diverse solutions, enhances the convergence of the WOA by increasing the likelihood of escaping local
optima to some extent. The pseudocode of the algorithm is presented as follows.

Performance analysis
To thoroughly verify the search capability of the proposed 3D-chaotic WOA method, the CEC2022 optimization
function test suite will be utilized in this section for evaluation. The suite comprises a total of 12 single-objective
test functions with boundary constraints, specifically: unimodal function (F1), multimodal function (F2-F5),
mixed function (F6-F8) and combined function (F9-F12), as outlined in (Table 1). All functions have a search
range of [−100,100]D, where D represents the dimensionality of the space.
To validate the effectiveness and superiority of the proposed Chaotic WOA algorithm, this section compares
and analyzes its performance against other modified WOA algorithms, such as QINWOA, AIBWOA, AdBet-

Fig. 7. Logarithmic spiral update mechanism.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 13

www.nature.com/scientificreports/

Algorithm 2. Chaotic Whale optimization algorithm.

WOA and HWOA-CHM and others. The comparative results are displayed in (Figs. 8 and 9). As observed from
the convergence curves in Fig. 8 and the box plots in Fig. 9, 3D-Chaotic WOA exhibits significantly superior
performance in terms of convergence speed and data concentration compared to the aforementioned methods.
Nevertheless, the convergence for the F2 function in Fig. 8 is not optimal, and the data distribution for the F8
function in Fig. 9 appears to be anomalous. This phenomenon indicates that no single optimization algorithm
can be universally effective for all types of functions or optimization problems.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 14

www.nature.com/scientificreports/

Type ID Functions fmin

Unimodal F1 Shifted and full rotated zakharov 300
F2 Shifted and full rotated rosenbrock’s 400
F3 Shifted and full rotated expanded schaffer’s f6 600
Basic
F4 Shifted and full rotated non-continuous rastrigin’s 800
F5 Shifted and full rotated levy 900
F6 Hybrid 1(N = 3) 1800
Hybrid F7 Hybrid 2(N = 6) 2000
F8 Hybrid 3(N = 5) 2200
F9 Compostion 1(N = 5) 2300
F10 Compostion 2(N = 4) 2400
Compostion
F11 Compostion 3(N = 5) 2600
F12 Compostion 4(N = 5) 2700

Table 1. Test set of optimization functions for CEC2022.

Furthermore, in elucidate the performance advantage of the 3D-Chaotic WOA method and other WOA
variants. Quantitative metrics such as minimum, standard deviation, mean, median and worst value are used
for comparison, as shown in (Table 2). The comparison results for these various indices in Table 2 reveal that the
3D-Chaotic WOA method demonstrates relatively superior performance across all metrics, with the exception of
a few where it does not perform optimally. This observation is consistent with the aforementioned conclusions.

Example verification
To substantiate the effectiveness and superiority of the proposed method in the domain of motor fault diagnosis,
empirical analysis is conducted in this section using motor fault data collected by the motor fault diagnosis test
platform, as shown in (Fig. 10).

Motor fault diagnostic test bench

The testbed comprises a 1.5 kW motor, a rotor supported by bearings at each end, a planetary gearbox, and a
series of magnetic brakes. It features four types of motors to choose from: a stator winding fault AC motor, a
rotor bar fault AC motor, and a bearing fault AC motor (with both inner and outer ring faults), in order to test
different fault scenarios in motors, as shown in (Fig. 11).
Among them, Fig. 11a is a stator winding fault AC motor, where a coil of the stator winding is routed out of
the motor junction box and connected to the “on/off ” switch on the coil line to simulate a short circuit. When
the switch is “on”, the winding is in a short-circuit state, and when it is “off ”, the motor reverts to a normal
state. Figure 11b shows the rotor bar failure of the AC motor, which has 28 rotor bars, four of which are cut.
The imbalance caused by cutting the rotor bars is compensated for by increasing the counterbalancing mass.
Figure 11c shows the bearing outer ring failure. Where a crack with a width of 30 mm and a depth of 3 mm is
introduced to mimic the failure of the bearing outer ring.

Motor fault data

In this section, in order to comprehensively explore the fault characteristics of the motor, we used the motor fault
test platform to collect the multi-mode fault information from the motor. This includes the environmental sound
signals, the motor running sound signals, the vibration signals from the front and rear ends of the motor, and the
three-phase current signals under two load conditions (magnetic powder brake simulated motor load) and full
load. The detailed data is presented in (Table 3). Compared to single-mode data, multi-mode fault data provide
richer information, leading to higher diagnostic accuracy and detection capability.

Motor fault diagnosis

The motor fault data collected in this section are multi-modal as shown in (Table 3). An independent time series
Transformer network is utilized for each data modality, and the algorithmic framework is shown in (Fig. 4).
Therefore, multi-GPU parallel programming is employed to train the ensemble Transformer network in this
section.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 15

www.nature.com/scientificreports/

Fig. 8. Tests the convergence curves of the different modified WOA algorithms for the CEC2022 set of
optimization functions.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 16

www.nature.com/scientificreports/

Figure 8. (continued)

Computing platform configuration

The main configuration of the computing platform used in this study is as follows.

(1) CPU: Intel(R) Core(TM) i9-10980XE @ 3.00 GHz, 18 cores, 32 threads.

(2) Memory: 8*8 GB/3600 MHz, 4 channels, total capacity: 64 GB.
(3) Video card: 2* RTX 3080Ti, video memory: 12 GB, number of CUDA: 10,240.

E-Transformer network optimization with chaotic WOA

Fitness function The motor fault data collected in this paper encompasses data from five different modes, and
correspondingly, five corresponding parallel time-series Transformer networks need to be constructed, namely
the E-Transformer network. When optimizing hyper-parameters of the E-Transformer network using the Cha-
otic WOA, the corresponding fitness function should be formulated as follows.
( n )
∑
F itness (loss) := min min (f itnessi (loss)) (29)
i=1

where, Fitness represents the fitness function of the E-Transformer network. Loss denotes the loss function
during the training process of the Transformer network. Fitnessi is the fitness function of each independent
Transformer network in the E-Transformer network, where i ranges from 1 to 5.

Steps of the algorithm

1. The collected data for five different modes of motor failure are shown in (Table 3). Subsequently, the length
of the data in each dimension within each modality is standardized to 1024. The dataset was then randomly
divided, with 60% allocated for training, 20% for validation, and 20% for testing.
2. Hyperparameters for each independent Transformer network within the E-Transformer network are initial-
ized uniformly. The embedding dimension is set to eight, the number of attention heads to 4, the number of
hidden layers to 32, the number of encoder layers to 2, the learning rate to 0.01, dropout rate to 0.2, batch size
to 8, and the execution environment is a GPU.
3. Initialize Chaotic WOA, Chaotic control parameter c = 3.99, variables x1 = rand(·), x2 = rand(·), x3 = rand(·),
and rand(·) ∈(0, 1). The total number of initialization tasks for the WOA is E = 5, and the dimension of
individual tasks is D = 7. Population size ND = [40, 20, 100, 20, 40, 20, 60]; The maximum number of itera-
tions is T = 200. Boundary constraints BD ∈ [b1 , b2 ]D for different tasks, [b1, b2]1 = [2, 256], [b1, b2]2 = [2,
16], [b1, b2]3 = [8, 1024], [b1, b2]4 = [1, 20], [b1, b2]5 = [1e-5, 1e-3], [b1, b2]6 = [0.1, 0.5], [b1, b2]7 = [1, 256].
φ = rand(·) ∈(0, 1), l = rand(.)∈[−1,1], β = [0.1, 0.1, 0.2, 0.2, 0.4], b = 1.
4. According to Eq. (29), the loss function of each independent Transformer network within the E-Transformer
network is serves as the corresponding fitness function, and the minimum value of the sum of these fitness
functions is then used as the fitness function for the E-Transformer network.
5. The E-transformer network, optimized with the chaotic WOA, is trained and tested.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 17

www.nature.com/scientificreports/

Fig. 9. Boxplots of different modified WOA algorithms testing the CEC2022 set of optimization functions.

Data preprocessing
To ensure that the timestamps of the fault data for all modes are aligned, the multi-modal data presented in Table
3 are collected synchronously with different sampling rates. This approach ensures that they represent the same
operational state.

Model training and prediction results

To validate the effectiveness of the E-Transformer network, this section explores nine different combinations of
the datasets presented in Table 3, as detailed in (Table 4). Table 4 lists the nine data combination methods and

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 18

www.nature.com/scientificreports/

Methods
Function Index 3d-chaotic WOA QINWOA AIBWOA AdBet-WOA HWOA-CHM
Minimum 300.0250635 300.3423149 300.0153407 364.9593177 300.0000006
Standard 914.7077538 1519.334743 3.876601931 2349.719021 41.42580328
F1 Average 617.8334396 1651.470084 301.456299 2514.422335 314.1116789
Median 336.4795434 1029.229123 300.2357751 1783.687283 300.0211391
Worse 5138.832598 6723.510192 318.8200822 10,198.67308 503.8604343
Minimum 404.3815468 400.5527203 400.0028717 400.4385323 400.210802
Standard 25.1316486 32.66327971 41.99181061 23.86996472 26.29028237
F2 Average 416.6808388 427.0438681 428.9774053 429.1913962 417.7832748
Median 407.6374296 409.1591576 408.9161019 416.2478595 408.9161019
Worse 484.8431648 495.3360541 585.5469193 471.2480942 477.983082
Minimum 600.000016 602.4012193 600.0001427 600.1316769 600.000135
Standard 5.546054308 6.850250975 0.546075183 1.406131003 7.949321983
F3 Average 604.5707309 610.2948183 600.2398369 601.2839783 605.0288421
Median 603.8688255 608.9102522 600.0012997 600.6974039 601.8340465
Worse 620.7524845 627.7869786 601.8487432 606.0197001 637.4782999
Minimum 805.9697543 811.9394986 804.9747954 804.9965828 816.9142889
Standard 8.609033873 10.66540885 7.961143489 8.90366749 10.25609173
F4 Average 828.256767 829.597794 817.0137698 818.3510741 834.3345457
Median 829.8487011 829.1178856 814.9243531 817.6528934 832.9824747
Worse 849.7477628 862.6820819 836.8134045 840.348724 864.6721014
Minimum 908.4450058 902.069867 900 900.1396861 913.4281536
Standard 228.9036172 94.58631233 0.468744918 43.74009323 207.0330163
F5 Average 1300.489351 970.5580724 900.3837458 926.3064774 1348.877427
Median 1447.395084 946.3284247 900.3166908 902.4454148 1467.951781
Worse 1483.202544 1390.865038 902.2717141 1108.329339 1487.565168
Minimum 2494.823705 1876.836312 1852.035243 2257.744187 1820.75571
Standard 1909.810849 2751.797318 2016.367778 2301.836277 2004.77828
F6 Average 5351.726764 5094.929231 4424.672053 6740.395293 4061.0663
Median 5169.601991 5306.516837 4254.18003 7876.052764 3901.605601
Worse 8295.688298 8272.493531 8076.248313 10,214.99967 7999.76305
Minimum 2019.901264 2021.515424 2001.011082 2001.329454 2000.624366
Standard 15.50434406 20.60298244 8.677268435 18.60571535 41.73787101
F7 Average 2027.225418 2040.331848 2017.918513 2033.414812 2040.722251
Median 2022.328497 2034.223974 2021.247387 2026.18004 2024.735646
Worse 2105.933794 2090.268307 2029.948415 2107.170341 2154.415744
Minimum 2213.659327 2220.066228 2200.136234 2222.237659 2220.007294
Standard 22.05781276 34.90555442 42.76430493 22.84673072 36.80204685
F8 Average 2226.631257 2238.957096 2235.100671 2229.863625 2233.619762
Median 2221.528429 2228.555993 2221.056711 2225.597736 2221.258828
Worse 2340.846033 2345.840927 2343.240369 2349.89066 2343.532162
Minimum 2529.284383 2529.284383 2529.284383 2529.311482 2529.284383
Standard 37.27783007 39.11826671 23.62135286 41.6013159 26.82598082
F9 Average 2539.079846 2550.319662 2534.643385 2576.798203 2534.182114
Median 2529.284383 2529.840387 2529.284383 2570.708646 2529.284383
Worse 2676.216331 2655.353113 2658.177527 2662.936412 2676.216331
Minimum 2500.337268 2500.519801 2500.341788 2500.326469 2425.239552
Standard 63.22821373 68.81636323 62.09839178 71.87232884 342.5445618
F10 Average 2544.633363 2552.299604 2582.464434 2555.807745 2707.995434
Median 2500.754857 2501.415606 2611.654448 2500.513981 2623.430333
Worse 2642.87555 2657.672139 2740.447454 2791.125486 3731.557739
Minimum 2600.0000 2600.0000 2600.009191 2608.498361 2600.0000
Standard 145.5219294 257.8538256 147.2691792 181.072179 129.6653296
F11 Average 2755.86114 2791.958702 2825.988824 2986.994476 2825.860421
Median 2750.450127 2750.482626 2900.001203 2969.731174 2900.0000
Worse 2912.302606 3960.855814 3212.652089 3380.961189 2912.720863
Continued

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 19

www.nature.com/scientificreports/

Methods
Function Index 3d-chaotic WOA QINWOA AIBWOA AdBet-WOA HWOA-CHM
Minimum 2861.435268 2862.442121 2863.265769 2860.877784 2862.567376
Standard 5.73684441 7.723810884 15.69866038 8.14140488 8.876280002
F12 Average 2866.772143 2870.296529 2875.413171 2869.330674 2869.243365
Median 2865.021458 2869.196163 2869.690723 2865.711272 2866.505986
Worse 2886.136 2904.281546 2929.029804 2895.066283 2903.406637

Table 2. The index values of different modified WOA algorithms for the set of CEC2022 optimization
functions.

Fig. 10. Test platform for motor fault diagnosis.

their corresponding optimal combination values for the E-Transformer network hyperparameters. From the
combinatorial values in (Table 4), it is observed that the combinatorial values of hyperparameters obtained by
Chaotic WOA vary when different dataset combinations are used for training. The optimized E-Transformer
network is tested on these nine datasets, and the results are shown in (Fig. 12).
Upon closely examining the training and testing results presented in Fig. 12, it is evident that the data of
different modalities and their various combinations play a pivotal role in the prediction outcomes and exert a
significant influence. To further substantiate the advantages of the proposed method, an analysis of the results
depicted in Fig. 12 was conducted, with the findings presented in (Table 5). Based on the statistical analysis of the
data in Table 5, the accuracy of fault identification of Set 12 is significantly improved. It is apparent that utilizing
multi-modal motor fault data contributes to enhancing the accuracy of fault diagnosis. Concurrently, this result
confirms the advantages of the proposed approach.

Comparative analysis
To further verify the advantages of the proposed method, this study utilized Long Short-Term memory network
(LSTM), Convolutional neural network (CNN), Gated Recurrent Unit (GRU), Bidirectional Long Short-Term
memory network (Bid-LSTM) and recurrent neural network (RNN) as substitutes for the Transformer module
in the E-Transformer network. To ensure the validity and fairness of the experiments, the hyperparameters
of these alternative algorithms were selected based on relevant literature. Subsequently, the Chaotic whale
optimization algorithm (Chaotic WOA) was employed to optimize the combination values of hyperparameters
for these replacement algorithms. Finally, dataset nine was used for both training and testing, and the results are
shown in (Table 6). The following metrics are compared in (Table 6): prediction results, accuracy, recall, F1 score,
and training time. By examining the corresponding index values of the different methods in Table 6, it can be
analyzed that E-Transformers exhibits a significant advantage in all metrics except for the longest training time.
Therefore, it can be concluded from the experimental comparison results that the proposed method possesses
the best predictive power.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 20

www.nature.com/scientificreports/

Fig. 11. Motor fault: (a) stator short circuit; (b) broken rotor bar; (c) The bearing outer ring is damaged.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 21

www.nature.com/scientificreports/

Motor fault type Load condition Operating frequency Data type Data dimension Data length Labels
Ambient sound 1
Device sound 2
Normal No load/full load 50 Hz Front end vibration 3 0
Back end vibration 3
Current 3
Ambient sound 1
Device sound 2
Bearing outer ring fault No load/full load 50 Hz Front end vibration 3 1
Back end vibration 3
Current 3
20480
Ambient sound 1
Device sound 2
Stator interturn short circuit No load/full load 50 Hz Front end vibration 3 2
Back end vibration 3
Current 3
Ambient sound 1
Device sound 2
Rotor broken No load/full load 50 Hz Front end vibration 3 3
Back end vibration 3
Current 3

Table 3. Multi-modal signals of motor fault.

Discussion and conclusion

Induction motors in industrial production face risks like rotor fracture, stator winding coil short-circuit, and
bearing faults due to harsh conditions and load changes. These faults affect performance and can stop production.
This paper presents an innovative motor fault diagnosis method. It combines Chaotic WOA and Transformer
network, using multimodal time series data for better fault early warning and diagnosis. The chaotic WOA
optimizes the Transformer network to handle data complexity, improving fault identification and localization. It
boosts accuracy and supports motor maintenance. The paper’s contents are as follows.
(1) In deep learning, Transformer networks are popular for sequence modeling, especially with time series
data. The proposed time series-based ensemble (parallel) Transformer network combines multiple models in
parallel. Each model processes input time series independently, capturing long-term dependencies and patterns.
Parallel computing speeds up data processing and improves learning and inference. (2) In motor fault diagnosis,
diverse data is key. We use multiple sensors. A current sensor gets the motor’s three-phase current signals.
Vibration sensors capture 3D vibration signals. 2D acoustic signals from the motor and 1D ambient acoustic
signals are also collected. These help build better diagnosis models. (3) In deep learning, optimization algorithm
choice matters. WOA has advantages but needs improvement in global search and convergence. We use a
modified 3D Logistic-Sine map. It combines Logistic’s nonlinearity and Sine’s periodicity for a better search space.
This balances exploration and exploitation, speeds convergence, and helps find optimal Transformer network
hyper-parameters for better classification. (4) In intelligent swarm optimization algorithms, many recent ones
are homogeneous in mechanism and operation, being just repackaged old algorithms, which restricts innovation
and development. This limits their potential and ability to handle complex challenges. To break this, current
algorithms should focus on creating universal rules by exploring problem and algorithm commonalities to find
more general models for different scenarios. (5) This paper focuses on single faults in induction motors like rotor
breakage, stator coil short-circuit, bearing cracks, spalling and pitting. Through data collection, signal processing
and algorithms, it diagnoses single faults precisely. But multiple concurrent faults are more complex as they
interact and obscure each other, making diagnosis harder. The method’s effectiveness for multiple faults needs
further validation despite good single fault performance.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 22

www.nature.com/scientificreports/

Hyper-parameters of the transformer network

Dataset Data types Embedding Head Hidden layer dimension Encoder layers Learning rate Dropout Batch size
Device sound 16 4 256 4 0.0002 0.2 64
Set 1
Ambient sound 16 4 64 2 0.0001 0.1 16
Front end vibration 32 4 256 4 0.0002 0.2 64
Set 2
Ambient sound 16 4 64 2 0.0001 0.1 16
Back end vibration 32 8 256 4 0.0002 0.2 32
Set 3
Ambient sound 16 4 64 2 0.0001 0.1 8
Current 64 8 256 4 0.0001 0.1 64
Set 4
Ambient sound 16 4 64 2 0.0002 0.2 16
Front end vibration 32 8 256 6 0.0001 0.2 32
Set 5 Back end vibration 32 4 256 4 0.0001 0.1 64
Ambient sound 16 4 64 2 0.0001 0.1 16
Device sound 128 4 512 8 0.0005 0.1 32
Set 6 Current 64 8 256 4 0.0001 0.1 64
Ambient sound 16 4 64 2 0.0001 0.1 16
Device sound 64 8 256 4 0.0001 0.2 32
Front end vibration 32 4 256 4 0.0002 0.2 64
Set 7
Back end vibration 32 8 256 4 0.0002 0.2 32
Ambient sound 16 4 64 2 0.0001 0.1 16
Device sound 64 8 256 8 0.0001 0.1 32
Current 128 8 256 6 0.0001 0.1 16
Set 8 (1d) Front end vibration 64 8 256 4 0.0002 0.2 32
Back end vibration 128 8 256 4 0.0001 0.1 32
Ambient sound 32 4 128 4 0.0002 0.2 16
Device sound 64 8 512 8 0.0001 0.1 32
Current 128 8 256 6 0.0001 0.1 16
Set 9 (3d) Front end vibration 64 8 256 4 0.0002 0.2 32
Back end vibration 128 8 256 4 0.0001 0.1 32
Ambient sound 32 4 128 4 0.0002 0.2 16
Device sound 64 8 512 8 0.00005 0.1 32
Current 128 8 256 6 0.0001 0.1 16
Set 10 (6d) Front end vibration 64 8 256 4 0.0002 0.2 32
Back end vibration 128 8 256 4 0.0001 0.1 32
Ambient sound 32 4 128 4 0.0002 0.2 16
Device sound 64 8 512 8 0.00005 0.1 32
Current 128 8 256 6 0.0001 0.1 16
Set 11 (9d) Front end vibration 64 8 256 4 0.0002 0.2 32
Back end vibration 128 8 256 4 0.0001 0.1 32
Ambient sound 32 4 128 4 0.0002 0.2 16
Device sound 64 8 512 8 0.00005 0.1 32
Current 128 8 256 6 0.0001 0.1 16
Set 12 (12d) Front end vibration 64 8 256 4 0.0002 0.2 32
Back end vibration 128 8 256 4 0.0001 0.1 32
Ambient sound 32 4 128 4 0.0002 0.2 16

Table 4. The best combination values of the hyper-parameters of the E-Transformer network correspond to 9
different forms of data combination.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 23

www.nature.com/scientificreports/

Fig. 12. Model training and prediction results for nine different data combinations.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 24

www.nature.com/scientificreports/

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 25

www.nature.com/scientificreports/

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 26

www.nature.com/scientificreports/

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 27

www.nature.com/scientificreports/

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 28

www.nature.com/scientificreports/

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 29

www.nature.com/scientificreports/

Figure 12. (continued)

Accuracy(%)
Data-set Training Validation Testing
Set 1 40.12 39.41 39.73
Set 2 96.18 74.01 85.33
Set 3 95.61 76.28 84.35
Set 4 97.01 74.17 85.95
Set 5 97.14 81.21 89.41
Set 6 96.51 81.58 89.47
Set 7 99.21 84.75 90.66
Set 8 99.60 88.15 92.29
Set 9 99.84 91.17 94.88
Set 10 99.13 93.15 96.13
Set 11 99.08 95.64 97.17
Set 12 99.86 99.76 99.10

Table 5. Comparison of training and prediction results for twelve different data combinations. The bold type
is intended to accentuate the enhancement effect of multimodal data on diagnostic accuracy, which has been
expounded in the text.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 30

www.nature.com/scientificreports/

Method Labels Results Accuracy Recall F1 score Training time(minute)

0 1.00 0.92 0.96
1 0.86 0.50 0.63
RNNs 0.81 82.33
2 0.67 0.86 0.75
3 0.83 1.00 0.91
0 1.00 0.79 0.88
1 0.71 0.50 0.59
CNNs 0.82 71.58
2 0.68 1.00 0.81
3 1.00 0.95 0.97
0 1.00 0.92 0.96
1 0.88 0.62 0.73
GRUs 0.83 73.15
2 0.72 0.82 0.77
3 0.80 1.00 0.89
0 0.81 1.00 0.90
1 1.00 0.74 0.85
LSTMs 0.93 85.24
2 1.00 1.00 1.00
3 1.00 1.00 1.00
0 1.00 0.92 0.96
1 0.91 0.88 0.89
Bid-LSTMs 0.94 87.15
2 0.88 1.00 0.93
3 1.00 0.95 0.97
0 1.00 1.00 0.99
1 0.99 1.00 1.00
E-Transformers 0.99 192.38
2 0.98 0.98 0.99
3 1.00 0.99 0.99

Table 6. Comparison results between the E-Transformer network and the alternative network. The bold type
is designed to highlight the test parameters of the experimental process of the proposed method, facilitating
comparison with other approaches.

Data availability
The data that support the findings of this study are available upon reasonable request from the corresponding
author.

Received: 14 June 2024; Accepted: 6 February 2025

References
1. Yang, Y. Y., Haque, M. M. M., Bai, D. L. & Tang, W. Fault diagnosis of electric motors using deep learning algorithms and its
application: A review. Energies 14, 7017 (2021).
2. Xu, B., Zhou, F. X., Li, H. P., Yan, B. K. & Liu, Y. Early fault feature extraction of bearings based on Teager energy operator and
optimal VMD. ISA Trans. 86, 249–265 (2019).
3. Chen, P. Y., Chao, K. H. & Tseng, Y. C. A motor fault diagnosis system based on cerebellar model articulation controller. IEEE
Access 7, 120326–120336 (2019).
4. Gyftakis, K. N., Spyropoulos, D. V., Kappatou, J. C. & Mitronikas, E. D. A Novel approach for broken bar fault diagnosis in
induction motors through torque monitoring. IEEE Trans. Energy Convers. 28, 267–277 (2013).
5. Gao, C. X., Lv, K., Si, J. K., Feng, H. C. & Hu, Y. H. Research on interturn short-circuit fault indicators for direct-drive permanent
magnet synchronous motor. IEEE J. Emerg. Sel. Top. Power Electron. 10, 1902–1914 (2022).
6. Contreras-Hernandez, J. L. et al. Geometric analysis of signals for inference of multiple faults in induction motors. Sensors 22, 2622
(2022).
7. Atta, M. E. E. D., Ibrahim, D. K. & Gilany, M. I. Broken bar fault detection and diagnosis techniques for induction motors and
drives: State of the art. IEEE Access 10, 88504–88526 (2022).
8. Zeng, C., Huang, S., Lei, J. Y., Wan, Z. X. & Yang, Y. M. Online rotor fault diagnosis of permanent magnet synchronous motors
based on stator tooth flux. IEEE Trans. Ind. Appl. 57(3), 2366–2377 (2021).
9. Contreras-Hernandez, J. L. et al. Quaternion signal analysis algorithm for induction motor fault detection. IEEE Trans. Ind.
Electron. 66, 8843–8850 (2019).
10. Zhou, H. L., Liu, Z. Y. & Yang, X. W. Motor torque fault diagnosis for four wheel independent motor-drive vehicle based on
unscented kalman filter. IEEE Trans. Veh. Technol. 67, 1969–1976 (2018).
11. Martin-Diaz, I., Morinigo-Sotelo, D., Duque-Perez, O. & Romero-Troncoso, R. J. An experimental comparative evaluation of
machine learning techniques for motor fault diagnosis under various operating conditions. IEEE Trans. Ind. Appl. 54, 2215–2224
(2018).
12. Kullu, O. & Cinar, E. Deep-learning-based multi-modal sensor fusion approach for detection of equipment faults. Machines 11,
1105 (2022).
13. Lang, W. J. et al. Artificial intelligence- based technique for fault detection and diagnosis of EV motors: A review. IEEE Trans.
Transp. Electrif. 8, 384–406 (2022).

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 31

www.nature.com/scientificreports/

14. Liu, R. N., Wang, F., Yang, B. Y. & Qin, S. J. Multiscale kernel based residual convolutional neural network for motor fault diagnosis
under nonstationary conditions. IEEE Trans. Ind. Inform. 16, 3797–3806 (2020).
15. Wang, J. J., Fu, P. L., Ji, S. H., Li, Y. L. & Gao, R. X. A light weight multisensory fusion model for induction motor fault diagnosis.
IEEE-ASME Trans. Mechatron. 27, 4932–4941 (2022).
16. An, K. et al. Edge solution for real-time motor fault diagnosis based on efficient convolutional neural network. IEEE Trans.
Instrum. 72, 1–12 (2023).
17. Sun, W. J. et al. A sparse auto-encoder-based deep neural network approach for induction motor faults classification. Measurement
89, 171–178 (2016).
18. Hoang, D. T. & Kang, H. J. A motor current signal-based bearing fault diagnosis using deep learning and information fusion. IEEE
Trans. Instrum. Meas. 69, 3325–3333 (2020).
19. Ribeiro, R. F. et al. Fault detection and diagnosis in electric motors using 1d convolutional neural networks with multi-channel
vibration signals. Measurement 190, 110759 (2022).
20. Attestog, S., Senanayaka, J. S. L., Van Khang, H. & Robbersmyr, K. G. Robust active learning multiple fault diagnosis of PMSM
drives with sensorless control under dynamic operations and imbalanced datasets. IEEE Trans. Ind. Inform. 19, 9291–9301 (2023).
21. Kao, I. H., Wang, W. J., Lai, Y. H. & Perng, J. W. Analysis of permanent magnet synchronous motor fault diagnosis based on
learning. IEEE Trans. Instrum. Meas. 68, 310–324 (2019).
22. Li, Z. Y., Wu, Q. M., Yang, S. M. & Chen, X. P. Diagnosis of rotor demagnetization and eccentricity faults for IPMSM based on deep
CNN and image recognition. Complex Intell. Syst. 8, 5469–5488 (2022).
23. Zhang, X. T. et al. Inferable deep distilled attention network for diagnosing multiple motor bearing faults. IEEE Trans. Transp.
Electrif. 9, 2207–2216 (2023).
24. Chen, J. J. et al. Novel data-driven approach based on capsule network for intelligent multi-fault detection in electric motors. IEEE
Trans. Energy Convers. 36, 2173–2184 (2021).
25. Principi, E., Rossetti, D., Squartini, S. & Piazza, F. Unsupervised electric motor fault detection by using deep autoencoders. IEEE-
CAA J. Autom. Sin. 6, 441–451 (2019).
26. Wang, F., Liu, R. N., Hu, Q. H. & Chen, X. F. Cascade convolutional neural network with progressive optimization for motor fault
diagnosis under nonstationary conditions. IEEE Trans. Ind. Inform. 17, 2511–2521 (2021).
27. Khanjani, M. & Ezoji, M. Electrical fault detection in three-phase induction motor using deep network-based features of
thermograms. Measurement 173, 108622 (2021).
28. Jang, J. G. et al. Vibration data feature extraction and deep learning-based preprocessing method for highly accurate motor fault
diagnosis. J. Comput. Des. Eng. 10, 204–220 (2023).
29. Zhu, Q. Y. et al. Real-time quality inspection of motor rotor using cost-effective intelligent edge system. IEEE Internet Things J. 10,
7393–7404 (2023).
30. Xu, P., Zhu, X. T. & Clifton, D. A. Multimodal learning with transformers: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 45,
12113–12132 (2023).
31. Xu, N., Mao, W. J., Wei, P. H. & Zeng, D. MDA: Multimodal data augmentation framework for boosting performance on sentiment/
emotion classification tasks. IEEE Intell. Syst. 36, 3–12 (2021).
32. Dai, Y. L., Yan, Z., Cheng, J. C., Duan, X. J. & Wang, G. J. Analysis of multimodal data fusion from an information theory
perspective. Inf. Sci. 623, 164–183 (2023).
33. Mu, S., Cui, M. & Huang, X. D. Multimodal data fusion in learning analytics: A systematic review. Sensors 20, 6856 (2020).
34 Qi, Q. F., Lin, L. Y., Zhang, R. & Xue, C. R. MEDT: Using multimodal encoding-decoding network as in transformer or multimodal
sentiment analysis. IEEE Access 10, 28750–28759 (2022).
35 Pawłowski, M., Wroblewska, A. & Sysko-Romanczuk, S. Effective techniques for multimodal data fusion: A comparative analysis.
Sensors 23, 2381 (2023).
36. Ma, M., Sun, C. & Chen, X. F. Deep coupling autoencoder for fault diagnosis with multimodal sensory data. IEEE Trans. Ind.
Inform. 14, 1137–1145 (2018).
37. Li, Q. F. & Li, L. X. Integrative factor regression and its inference for multimodal data analysis. J. Am. Stat. Assoc. 117, 2207–2221
(2022).
38. Sleeman, W. C., Kapoor, R. & Ghosh, P. Multimodal classification: Current landscape, taxonomy and future directions. ACM
Comput. Surv. 7, 1–31 (2022).
39. Deng, L. Y. & Liu, S. Y. Deficiencies of the whale optimization algorithm and its validation method. Expert Syst. Appl. 237, 121544
(2024).
40. Zhou, R. H., Zhang, Y. & He, K. A novel hybrid binary whale optimization algorithm with chameleon hunting mechanism for
wrapper feature selection in QSAR classification model: A drug-induced liver injury case study. Expert Syst. Appl. 234, 121015
(2023).
41 Bezerra, J. I. M., Camargo, V. V. D. & Molter, A. A new efficient permutation-diffusion encryption algorithm based on a chaotic
map. Chaos Solitons Fractals 151, 111235 (2021).
42. Xu, Y. F., Zhou, S. B. & Huang, Y. H. Transformer-based model with dynamic attention pyramid head for semantic segmentation
of VHR remote sensing imagery. Entropy 24, 1619 (2022).
43. Mirjalili, S. & Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016).
44 Liu, H. B., Abraham, A. & Clerc, M. Chaotic dynamic characteristics in swarm intelligence. Appl. Soft. Comput. 7, 1019–1026
(2007).
45. He, Y. Y., Zhou, J. Z., Xiang, X. Q., Chen, H. & Qin, H. Comparison of different chaotic maps in particle swarm optimization
algorithm for long-term cascaded hydroelectric system scheduling. Chaos Solitons Fractals 42, 3169–3176 (2009).
46. Hua, Z. Y. & Zhou, Y. C. Image encryption using 2D logistic-adjusted-sine map. Inf. Sci. 339, 237–253 (2016).
47. Hua, Z. Y., Zhou, Y. C., Pun, C. M. & Chen, C. L. P. 2D sine logistic modulation map for image encryption. Inf. Sci. 297, 80–94
(2015).
48. Ghebleh, M. & Kanso, A. A novel efficient image encryption scheme based on chained skew tent maps. Neural Comput. Appl. 31,
2415–2430 (2019).
49. Zhang, Z. H., Wang, H. W. & Gao, Y. H C2MP: Chebyshev chaotic map-based authentication protocol for RFID applications. Pers.
Ubiquit. Comput. 19, 1053–1061 (2015).
50. Wu, C. Y., Sun, K. H. & Xiao, Y. A hyperchaotic map with multi-elliptic cavities based on modulation and coupling. Eur. Phys. J.
Spec. Top. 230, 2011–2020 (2021).
51. Bodaghi, A. & Fosner, A. Characterization, stability and hyperstability of multi-quadratic-cubic mappings. J. Inequal. Appl. 2021,
12 (2021).
52. De la Fraga, L. G., Mancillas-López, C. & Tlelo-Cuautle, E. Designing an authenticated hash function with a 2D chaotic map.
Nonlinear Dyn. 104, 4569–4580 (2021).
53. Tanveer, M. et al. Multi-images encryption scheme based on 3D chaotic map and substitution box. IEEE Access 9, 73924–73937
(2021).
54. Wei, J. M., Chen, Y. Q., Yu, Y. G. & Chen, Y. Q. Optimal randomness in swarm-based search. Mathematics 7, 828 (2019).
55. Hua, Z. Y., Jin, F., Xu, B. X. & Huang, H. J. 2D logistic-sine-coupling map for image encryption. Signal Process. 149, 148–161 (2018).
56. Huang, H. Q. Novel scheme for image encryption combining 2d logistic-sine-cosine map and double random-phase encoding.
IEEE Access 7, 177988–177996 (2019).

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 32

www.nature.com/scientificreports/

57. Gu, G. S. & Ling, J. A fast image encryption method by using chaotic 3D cat maps. Optik 125, 4700–4705 (2014).
58. Tang, X. F. et al. A physical layer security-enhanced scheme in CO-OFDM system based on CIJS encryption and 3D-LSCM chaos.
J. Lightwave Technol. 40, 3567–3575 (2022).
59 Sathiyamurthi, P. & Ramakrishnan, S. Speech encryption algorithm using FFT and 3D-lorenz-logistic chaotic map. Multimed.
Tools Appl. 79, 17817–17835 (2020).

Acknowledgements
This work is supported in part by the National Natural Science Foundation of China (Nos. 51975433), and Re-
search Project of Hubei Provincial Department of Education under Grant B2022203.

Author contributions
B.X. and H.L. wrote the main manuscript text and F.Z. designed the optimization algorithm. R.D. designed the
classification model.All authors reviewed the manuscript.

Declarations

Competing interests
The authors declare no competing interests.

Additional information
Correspondence and requests for materials should be addressed to H.L.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in
any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide
a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have
permission under this licence to share adapted material derived from this article or parts of it. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence
and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to
obtain permission directly from the copyright holder. To view a copy of this licence, visit http://c reativecom
mo
ns.org/l icenses/by -nc-nd/4.0/.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 33

Turkish Dried Fruits
100% (1)
Turkish Dried Fruits
24 pages
CRJ 700 - 900 Air Conditioning MX Fam
100% (1)
CRJ 700 - 900 Air Conditioning MX Fam
90 pages
Planificare Pregatitoare Ars Libri
No ratings yet
Planificare Pregatitoare Ars Libri
3 pages
Sizing Capacitor Banks Power Factor Correction
No ratings yet
Sizing Capacitor Banks Power Factor Correction
21 pages
Geological Field Report of Bunar Area
No ratings yet
Geological Field Report of Bunar Area
28 pages
Optimizing Induction Motor Fault Detection With Transfer Learning: A Comparative Analysis of Deep Learning Models
No ratings yet
Optimizing Induction Motor Fault Detection With Transfer Learning: A Comparative Analysis of Deep Learning Models
11 pages
Fault Diagnosis of Induction Motors With Dynamical Neural Networks
No ratings yet
Fault Diagnosis of Induction Motors With Dynamical Neural Networks
7 pages
Research Article
No ratings yet
Research Article
10 pages
References Thesis
No ratings yet
References Thesis
4 pages
Revolutionizing Motor Maintenance A Comp
No ratings yet
Revolutionizing Motor Maintenance A Comp
22 pages
Analysis of Fault Detection and IoT Based Monitoring of Induction Motors
No ratings yet
Analysis of Fault Detection and IoT Based Monitoring of Induction Motors
6 pages
Machine Learning Based Multi Class Fault Diagnosis Tool For Voltage Source Inverter Driven Induction Motor
No ratings yet
Machine Learning Based Multi Class Fault Diagnosis Tool For Voltage Source Inverter Driven Induction Motor
11 pages
Detection of Inter-Turn Short Circuits in Inductio
No ratings yet
Detection of Inter-Turn Short Circuits in Inductio
23 pages
Induction Motors Fault Diagnosis Using Finite Element Method A Review
No ratings yet
Induction Motors Fault Diagnosis Using Finite Element Method A Review
13 pages
Efficiency-Centered Fault Diagnosis of In-Service Induction Motors For Digital Twin Applications A Case Study On Broken Rotor Bars
No ratings yet
Efficiency-Centered Fault Diagnosis of In-Service Induction Motors For Digital Twin Applications A Case Study On Broken Rotor Bars
22 pages
Iot Based Induction Monitoring Report
No ratings yet
Iot Based Induction Monitoring Report
65 pages
Discover Electronics: Advanced Fault Detection Technique of Three Phase Induction Motor: Comprehensive Review
No ratings yet
Discover Electronics: Advanced Fault Detection Technique of Three Phase Induction Motor: Comprehensive Review
25 pages
Fault Diagnosis System of Induction Motors Based o
No ratings yet
Fault Diagnosis System of Induction Motors Based o
13 pages
2008 Bollini
No ratings yet
2008 Bollini
24 pages
Interacting Multiple Model Framework For Incipient Diagnosis of Interturn Faults in Induction Motors
No ratings yet
Interacting Multiple Model Framework For Incipient Diagnosis of Interturn Faults in Induction Motors
10 pages
A Review of Techniques Used For Induction Machine
No ratings yet
A Review of Techniques Used For Induction Machine
18 pages
Fault Detection in Induction Motor Using WPT and Multiple SVM
No ratings yet
Fault Detection in Induction Motor Using WPT and Multiple SVM
12 pages
Diagnosis of Induction Motor Faults Using The Motor Current Normalized Residual Harmonic Analysis Method
No ratings yet
Diagnosis of Induction Motor Faults Using The Motor Current Normalized Residual Harmonic Analysis Method
9 pages
يس
No ratings yet
يس
13 pages
Various Types of Faults and Their Detect PDF
No ratings yet
Various Types of Faults and Their Detect PDF
10 pages
ICEE2013 Greety Project
No ratings yet
ICEE2013 Greety Project
8 pages
Comparison of Machine Learning Algorithms For Induction MotorRotor SingleFault Diagnosis Using Stator Current Signal
No ratings yet
Comparison of Machine Learning Algorithms For Induction MotorRotor SingleFault Diagnosis Using Stator Current Signal
6 pages
Mini Project
No ratings yet
Mini Project
11 pages
12 Ijirset SPL Issue 101
No ratings yet
12 Ijirset SPL Issue 101
6 pages
Review On Machine Learning Algorithm Bas
No ratings yet
Review On Machine Learning Algorithm Bas
12 pages
Multi Sensor Data Fusion Based Expert System in Condition Monitoring and Fault Detection: An Induction Motor Case Study
No ratings yet
Multi Sensor Data Fusion Based Expert System in Condition Monitoring and Fault Detection: An Induction Motor Case Study
6 pages
Livro Induction Motor Fault Diagnosis
50% (2)
Livro Induction Motor Fault Diagnosis
182 pages
Analysis of Permanent Magnet Synchronous Motor
No ratings yet
Analysis of Permanent Magnet Synchronous Motor
15 pages
2020 TEC ZSC Accepted Final
No ratings yet
2020 TEC ZSC Accepted Final
10 pages
A Novel Multi-Agent Approach To Identify Faults in Line Connectedthree-Phase Induction Motors
No ratings yet
A Novel Multi-Agent Approach To Identify Faults in Line Connectedthree-Phase Induction Motors
10 pages
The Intelligent Fault Diagnosis Frameworks Based On Fuzzy Integral
No ratings yet
The Intelligent Fault Diagnosis Frameworks Based On Fuzzy Integral
6 pages
Bearing Fault Diagnosis of - (Z-Library)
No ratings yet
Bearing Fault Diagnosis of - (Z-Library)
19 pages
1 Stator Fault Paz
No ratings yet
1 Stator Fault Paz
6 pages
Automatic Speed Control of Motor Via WAD Technique PDF
No ratings yet
Automatic Speed Control of Motor Via WAD Technique PDF
8 pages
A Comprehensive Evaluation of Intelligent Classifiers For Fault Identification in Three-Phase Induction Motors
No ratings yet
A Comprehensive Evaluation of Intelligent Classifiers For Fault Identification in Three-Phase Induction Motors
10 pages
Document 0
No ratings yet
Document 0
73 pages
Current Signature Analysis of Induction Motor Mechanical Faults by Wavelet Packet Decomposition
No ratings yet
Current Signature Analysis of Induction Motor Mechanical Faults by Wavelet Packet Decomposition
12 pages
Vibration Analysis For Fault Diagnosis in Induction Motors
No ratings yet
Vibration Analysis For Fault Diagnosis in Induction Motors
18 pages
Ieee Tec 1999 Benbouzid
No ratings yet
Ieee Tec 1999 Benbouzid
11 pages
Ayaz Ahmed Soomro
No ratings yet
Ayaz Ahmed Soomro
9 pages
Faults Diagnosis in Stator Windings of High Speed Solid Rotor Induction Motors Using Fuzzy Neural Network
No ratings yet
Faults Diagnosis in Stator Windings of High Speed Solid Rotor Induction Motors Using Fuzzy Neural Network
15 pages
Data-Driven Inter-Turn Short Circuit Fault Detection in Induction Machines
No ratings yet
Data-Driven Inter-Turn Short Circuit Fault Detection in Induction Machines
14 pages
Method For Identifying Stator and Rotor Faults of
No ratings yet
Method For Identifying Stator and Rotor Faults of
13 pages
Sensors 25 00471
No ratings yet
Sensors 25 00471
24 pages
Diagnosis of Induction Motor Faults Based On Curre
No ratings yet
Diagnosis of Induction Motor Faults Based On Curre
9 pages
Variable Speed Induction Motors' Fault Detection Based On Transient Motor Current Signatures Analysis: A Review
No ratings yet
Variable Speed Induction Motors' Fault Detection Based On Transient Motor Current Signatures Analysis: A Review
22 pages
Induction Motor Noninvasive Fault Diagnostic Techniques: A Review
No ratings yet
Induction Motor Noninvasive Fault Diagnostic Techniques: A Review
7 pages
Energies 14 01630 With Cover
No ratings yet
Energies 14 01630 With Cover
24 pages
Evaluation of Stator Winding Faults Severity in Inverter-Fed Inductionmotors
No ratings yet
Evaluation of Stator Winding Faults Severity in Inverter-Fed Inductionmotors
12 pages
Sensors 22 09668 v2
No ratings yet
Sensors 22 09668 v2
28 pages
Rfty
No ratings yet
Rfty
15 pages
Sensors
No ratings yet
Sensors
20 pages
Rodriguez 2011
No ratings yet
Rodriguez 2011
7 pages
Fault Diagnosis of Electric Motors Using Vibration Signal Analysis
No ratings yet
Fault Diagnosis of Electric Motors Using Vibration Signal Analysis
8 pages
Diallo 2005
No ratings yet
Diallo 2005
8 pages
2023 Sensors-23-02649
No ratings yet
2023 Sensors-23-02649
20 pages
1 s2.0 S0925231215016033 Main
No ratings yet
1 s2.0 S0925231215016033 Main
13 pages
Principles and Applications of Electrical Generators: Definitive Reference for Developers and Engineers
From Everand
Principles and Applications of Electrical Generators: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Relay Technology and Applications: Definitive Reference for Developers and Engineers
From Everand
Relay Technology and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Electromagnetic Compatibility Engineering Essentials: Definitive Reference for Developers and Engineers
From Everand
Electromagnetic Compatibility Engineering Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fault Detection Paper - Ver2
No ratings yet
Fault Detection Paper - Ver2
5 pages
T. Pham: Chapter 2: Analytic Geometry of Space
No ratings yet
T. Pham: Chapter 2: Analytic Geometry of Space
64 pages
ETI 1977 Projects 1 & 2 - 1
No ratings yet
ETI 1977 Projects 1 & 2 - 1
60 pages
T. Pham: Chapter 1: SEQUENCES and SERIES
No ratings yet
T. Pham: Chapter 1: SEQUENCES and SERIES
76 pages
EM Physics Exam Retake
No ratings yet
EM Physics Exam Retake
4 pages
Marine Environment and Their Divisions
No ratings yet
Marine Environment and Their Divisions
11 pages
Clamps Brochure r8 FINAL
No ratings yet
Clamps Brochure r8 FINAL
48 pages
Assessing The Urbanization-Driven Challenges of Transportation Scarcity and Commuter Accessibility in Davao City: A Systematic Review
No ratings yet
Assessing The Urbanization-Driven Challenges of Transportation Scarcity and Commuter Accessibility in Davao City: A Systematic Review
11 pages
Avo, Inversion AND Seismic Attributes: Poisson's Ratio Volumetrics
No ratings yet
Avo, Inversion AND Seismic Attributes: Poisson's Ratio Volumetrics
2 pages
Friction Factor For Turbulent Pipe Flow
100% (1)
Friction Factor For Turbulent Pipe Flow
16 pages
XRF Theory
No ratings yet
XRF Theory
1 page
Kenwood KIL60W23 Refrigerator
No ratings yet
Kenwood KIL60W23 Refrigerator
20 pages
Dvlsi Unit-1 Notes
No ratings yet
Dvlsi Unit-1 Notes
5 pages
Brochure ZWCAD Mechanical 2019
No ratings yet
Brochure ZWCAD Mechanical 2019
2 pages
PH-KMP-PP Presisi Tbk-004-060122
No ratings yet
PH-KMP-PP Presisi Tbk-004-060122
2 pages
Paint Specification of IRPC (Chugoku1)
No ratings yet
Paint Specification of IRPC (Chugoku1)
6 pages
7 Types of Twins
No ratings yet
7 Types of Twins
3 pages
Poster For ASCP SAFER FINAL
No ratings yet
Poster For ASCP SAFER FINAL
1 page
Piping Material Specification
100% (1)
Piping Material Specification
120 pages
IS 2911 Part 3 - 2015
No ratings yet
IS 2911 Part 3 - 2015
47 pages
Coficients of Positive Moment For Continous Beam
No ratings yet
Coficients of Positive Moment For Continous Beam
44 pages
Brickwork BQ For External Works
No ratings yet
Brickwork BQ For External Works
9 pages
Research Hypotheses-2025
No ratings yet
Research Hypotheses-2025
54 pages
Formulas & Fundas: For CAT, XAT & Other MBA Entrance Examinations
No ratings yet
Formulas & Fundas: For CAT, XAT & Other MBA Entrance Examinations
64 pages
IVT Network - API Pharmaceutical Water Systems Part I - Water System Design - 2014-06-13
No ratings yet
IVT Network - API Pharmaceutical Water Systems Part I - Water System Design - 2014-06-13
5 pages
Introduction of Micrometer Screw Gauge
No ratings yet
Introduction of Micrometer Screw Gauge
8 pages
40rusa - 20 Ton Carrier
No ratings yet
40rusa - 20 Ton Carrier
13 pages
Attachment I - MERV Rating Chart OCR
No ratings yet
Attachment I - MERV Rating Chart OCR
2 pages
EM Wave Summary QUiz
No ratings yet
EM Wave Summary QUiz
42 pages
Present Continuous Tense For Kids
50% (2)
Present Continuous Tense For Kids
24 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Fault Diagnosis in Electric Motors Using Multi-Mode Time Series and Ensemble Transformers Network

Uploaded by

Fault Diagnosis in Electric Motors Using Multi-Mode Time Series and Ensemble Transformers Network

Uploaded by

www.nature.

OPEN Fault diagnosis in electric motors

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 1

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 2

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 3

Fig. 1. Algorithm structure diagram for the transformers.

MultiHead(Q,K, V ) = Concat(head1 , . . . ,headh )W O (1)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 4

Whale optimization algorithm

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 5

Fig. 3. Spiral bubble net foraging strategy of the humpback whale.

Surround the prey

X(t + 1) = D · ebl · cos(2πl) + X ∗ (t)(7)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 6

Search for prey

where, Xrand is a randomly generated position vector of a whale individual.

Ensemble transformer network

Fig. 4. Frame diagram of the proposed motor fault diagnosis algorithm.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 7

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 8

Algorithm 1. Training an empirical transformer network.

, where E ∈ R(N ×C)×D C.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 9

z′l = LayerNorm(M SA(zl−1 ) + zl−1 ) l = 1, 2, · · · , N (10)

zl = LayerNorm(F F N (zl′ ) + zl′ ) l = 1, 2, · · · , N (11)

where, β is a hyper-parameter that determines the weight of zcl .

hK = argmax [HK (Kj )]b (15)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 10

3D chaotic composite maps

Chaotic whale optimization algorithm

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 11

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 12

X(t + 1) = D × l × ∥Mdistance ∥22 + X ∗ (t), p ≥ 0.5(26)

Fig. 7. Logarithmic spiral update mechanism.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 13

Algorithm 2. Chaotic Whale optimization algorithm.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 14

Type ID Functions fmin

Table 1. Test set of optimization functions for CEC2022.

Motor fault diagnostic test bench

Motor fault data

Motor fault diagnosis

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 15

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 16

Computing platform configuration

(1) CPU: Intel(R) Core(TM) i9-10980XE @ 3.00 GHz, 18 cores, 32 threads.

E-Transformer network optimization with chaotic WOA

Steps of the algorithm

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 17

Model training and prediction results

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 18

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 19

Fig. 10. Test platform for motor fault diagnosis.

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 20

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 21

Table 3. Multi-modal signals of motor fault.

Discussion and conclusion

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 22

Hyper-parameters of the transformer network

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 23

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 24

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 25

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 26

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 27

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 28

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 29

Figure 12. (continued)

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 30

Method Labels Results Accuracy Recall F1 score Training time(minute)

Received: 14 June 2024; Accepted: 6 February 2025

Scientific Reports | (2025) 15:7834 | https://doi.org/10.1038/s41598-025-89695-6 31

MultiHead(Q,K, V ) = Concat(head1 , . . . ,headh )W O (1)

X(t + 1) = D · ebl · cos(2πl) + X ∗ (t)(7)

z′l = LayerNorm(M SA(zl−1 ) + zl−1 ) l = 1, 2, · · · , N (10)

zl = LayerNorm(F F N (zl′ ) + zl′ ) l = 1, 2, · · · , N (11)

hK = argmax [HK (Kj )]b (15)

X(t + 1) = D × l × ∥Mdistance ∥22 + X ∗ (t), p ≥ 0.5(26)

(1) CPU: Intel(R) Core(TM) i9-10980XE @ 3.00 GHz, 18 cores, 32 threads.