Autoencoder-Based_Iterative_Modeling_and_Multivari
Autoencoder-Based_Iterative_Modeling_and_Multivari
ABSTRACT This paper introduces an algorithm for the detection of change-points and the identification
of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such
data has become increasingly important due to growing availability in many industrial fields. Labeling,
sorting or filtering highly transient measurement data for training Condition-based Maintenance (CbM)
models is cumbersome and error-prone. For some applications it can be sufficient to filter measurements
by simple thresholds or finding change-points based on changes in mean value and variation. But a robust
diagnosis of a component within a component group for example, which has a complex non-linear correlation
between multiple sensor values, a simple approach would not be feasible. No meaningful and coherent
measurement data, which could be used for training a CbM model, would emerge. Therefore, we introduce an
algorithm that uses a recurrent neural network (RNN) based Autoencoder (AE) which is iteratively trained on
incoming data. The scoring function uses the reconstruction error and latent space information. A model of
the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline
clustering. For evaluation, we propose a new similarity measure based on the curvature for a more intuitive
time-series subsequence clustering metric. A comparison with seven other state-of-the-art algorithms and
eight datasets shows the capability and the increased performance of our algorithm to cluster MTSD online
and offline in conjunction with mechatronic systems.
INDEX TERMS Condition-based maintenance, multivariate time-series data, change point detection,
unsupervised clustering, autoencoder, segmentation, subsequence, clustering.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
18868 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
in idle mode during waiting or preparation time for a longer The new algorithm which we introduce in this work
trip or other measurements. Also, very transient episodes are is capable of generating subsequence models from online
existent (e.g., Real Driving Emissions (RDE)). All of these streaming data which is processed sequentially. Any coherent
measurements do not necessarily have the same calibration of subsequence that is identified can be recognized (clustered)
the underlying mechatronic system. To train a robust model of if occurring again. Depending on multiple sensitivity cali-
the mechatronic system, component group or a single compo- bration parameters, time-varying data points are associated
nent, a big effort has to be put in the design of the experiments and identified as a subsequence. The parameters determine
alone, not to mention the experiments themselves. Therefore, the volatility or the strength of the affiliation required to be
a method of automatically labeling existing measurements is recognized as one time varying subsequence. These subse-
of advantage. Afterwards an automatic sorting of the labeled quence models can also be applied efficiently offline onto
time sequences by statistical methods is possible, to enable a large existing datasets. During this prediction phase, the algo-
data driven mechatronic diagnosis approach. rithm provides a vector of subsequence labels which were
Using advanced unsupervised approaches for CbM allows recognized as one from the training data. Depending on the
the data to be unlabeled (otherwise supervised methods could calibration, it can also provide a label for unknown data which
be used). In this case the labeling refers to the label of the represents a phase where no pattern could be recognized.
condition (mechanical degradation) of the monitored system. Otherwise, it finds the best fitting subsequence and labels it
When trying to diagnose mechatronic systems that have many as that. The approach published in this work is currently only
operating points and are free to transfer in between those or based on MTS input but could be adapted for a univariate
are capable of totally transient operation modes, then a robust input. It is a multivariate time-series sub-sequence discovery
diagnosis of the actual condition of the mechatronic system and identification method.
is extremely challenging. An early and reliable (robust) diag- Our contribution is a new algorithm for online sub-
nosis of a mechatronic system prevents accidents, enables sequence clustering of MTSD called ‘‘Autoencoder-based
optimal maintenance and increases uptime of machinery. Iterative Modeling and Subsequence Clustering Algorithm
Without the knowledge of the current condition of the system, (ABIMCA)’’ and a new metric to evaluate cluster algo-
fault prevention can only be done by predetermined mainte- rithms focused on this task ‘‘Multivariate Time-Series Sub-
nance intervals. Motivation is therefore to monitor the health Sequence Clustering Metric (MT3SCM)’’. We compare our
condition of the mechatronic system as close as possible, algorithm with
resulting in the task of separating discrete sensory data into • seven other state-of-the-art algorithms
uniquely identifiable and recognizable segments or subse- • eight datasets, from which six are publicly available and
quences. This is beneficial to the performance of anomaly two are provided with our codebase
detection, because if all normally occurring subsequences are • three widely used unsupervised clustering metrics
identified, the detection of abnormal or faulty subsequences • our own metric (MT3SCM) and its four components
is straightforward.
When monitoring the health condition of a mechatronic while varying the use of default algorithm parameters with
system it is state of the art, to manually calibrate specific optimized parameters on each algorithm and dataset via ran-
release conditions, during which the condition monitoring is dom grid search.
enabled. This is done to exclude operating points which are
very rare, too transient or are just not feasible for drawing II. RELATED WORK
conclusions about the condition of the mechatronic system. In this section we define the terminology used and its seman-
But even restricting the conditions on where to diagnose the tics to categorize our work within the large bibliography
machine (which already is reducing the probability of diag- existing in this field and provide a selected list of related
nosing the machine at all due to an operating state which is by works and their ascendancy to this paper.
chance outside the release conditions) cannot always help to
improve the fault detection, identification and quantification A. TERMINOLOGY AND SEMANTICS
of its magnitude. For example, in a mechatronic system with a Numerous possibilities have been described for achieving
complex nonlinear dependency of its subcomponents and its our main goal of segmenting discrete time-series sensory
time dependency, ‘‘going in’’ or ‘‘going out’’ of the release data. Most approaches can be sorted into the following par-
conditions can result in very different system behavior. Com- tially overlapping categories: time-series analysis [3], pattern
paring these two states does not lead to reliable conclusions recognition [4], temporal knowledge discovery [5], motif dis-
for the mechatronic systems health condition. covery [6], change-point detection [7], data clustering [8]
Therefore, the kind of data sequences used for train- or anomaly detection [9]. All those terms refer to methods
ing/calibration and validation is crucial for any monitoring or algorithms which could be used directly or indirectly to
strategy. Manually screening, labeling and sorting data into achieve our goal. Explicit description of each term or category
comparable sequences is time-consuming, error-prone and can be found in the stated references.
cumbersome. Additionally, this is a decision process which To limit the scope of this work, we focus on data clus-
requires expert and domain knowledge. tering which can be separated into six subcategories by the
perceptron (MLP), convolutional neural network (CNN), test to evaluate time-series stationarity and perform a segmen-
deep belief network (DBN), generative adversarial net- tation based on this [37]. In [38] dynamic latent variables
work (GAN) and variational autoencoder (VAE) among from a vector autoregression (VAR) model in combination
others [29]. with a principal component analysis (PCA) is used for seg-
The algorithms we use for comparison are listed in Table 1. menting industrial TSD. Reference [39] shows the advantage
All of these are online clustering algorithms that can be of an embedding approach as well, by introducing a PCA
used for time-series clustering. Implementations are publicly and a Vanilla-AE CP detection method with the restriction
available in the Python programming language (see library of focusing on multivariate power grid data.
column in Table 1) and are well established and tested. Focused on transfer learning, [40] introduces an adversar-
The Balanced Iterative Reducing and Clustering using ial approach for domain adaption using a stacked AE. Offline
Hierarchies (BIRCH) [16] algorithm is based on a clustering convolutional sparse AE used for supervised sequence clas-
features (CF) tree with the CF as a triple of the number of sification was done by [41] and adapted by [42] for
data points, linear sum and the squared sum. This CF tree unsupervised motif mining. Other AE based papers are, for
is built dynamically. It was also one of the earliest algo- example, by [43] who use a mixture of AEs for image and text
rithms capable of online clustering. The Bayesian Online clustering. Stacked AE and k-means for offline clustering is
Change-point Detection (BOCPD) [17] algorithm, as the done by [44] without considering time dependency. Showing
name suggests, uses Bayesian methods to detect change- the combination of GRU-based AE and MTS for anomaly
points (CPs) online. Since this algorithm only detects CPs, detection is done by [45]. Reference [46] applies the sliding
we manipulated the result to be able to interpret every CP windows approach on CNN-based AE for Anomaly Detec-
as the beginning of a new cluster. This algorithm starts tion of Industrial Robots. A similar approach using AE for
in our comparison with the limitation of not being able MTS segmentation is published in [47]. The focus there is on
to recognize a previously seen cluster. The Stream Clus- change point detection using latent space variables only and
tering Framework (CluStream) [19] algorithm is based on no clustering or identification of the subsequences is done.
extended CF from BIRCH, following a k-means algorithm. Clustering is strongly depending on the data and task pro-
The Density-based Stream Clustering (DBSTREAM) [20] vided: “. . . ; each new clustering algorithm performs slightly
algorithm is based on the Self Organizing density-based clus- better than the existing ones on a specific distribution of pat-
tering over data Stream (SOStream) [30] and uses a shared terns.” ([8, S. 268]). Therefore, we try to apply the algorithms
density graph to capture the density between micro-clusters. on multiple different MTS datasets and compute different
The Density-Based Clustering over an Evolving Data Stream metrics for comparison. Large efforts are made for making
with Noise (DenStream) [21] algorithm is an extension of datasets available to the scientific community and the public
Density-Based Spatial Clustering of Applications with Noise to improve comparability and reproducibility by universities
(DBSCAN) [31] which uses a damped window model of CF or governmental institutions [48], [49]. For this paper we
to create core-micro-clusters and outlier-micro-clusters. The focus on data with multivariate quantitative features with
Mini-Batch K-Means (MiniBatchKMeans) [22] algorithm continuous values. For a list of the datasets see Table 3
proposes “the use of mini-batch optimization for k-means and a brief description is given in section IV. Evaluating
clustering” ([22]) to improve the k-means optimization prob- the performance of a clustering algorithm can be done with
lem. The STREAMKMeans [18] algorithm uses an adaption two different approaches. If external knowledge about the
of the original STREAM algorithm from [32]. Replacing the ground truth of each data point and its cluster is known,
k-median subroutine LSEARCH by an incremental k-means then so-called external measures can be applied. If no ground
algorithm. More information and comparison of most of the truth is available, internal measures need to suffice. Many
used algorithms can be found in [26] and [33] external measures exist, like the well-known F1-score (based
As described in section I, the focus of this publication on the effectiveness measure by [50]). With the large number
is on the online multivariate time dependent subsequence of data available and working in the context of transient
clustering using RNN based AE. The number of algorithms machine behavior with the focus on finding internal states
within this scope is limited compared to the number of clus- of the system, acquiring or providing the ground truth is
tering algorithms in general. The following approaches use time-consuming, error-prone and cumbersome (as described
at least some of those prerequisites. Reference [34] empha- in section I). “The definition of clusters depends on the user,
sizes the term segmentation for an offline sliding window the domain, and it is subjective.” ([25, S. 30]). We therefore
and bottom-up algorithm. Others are converting the time- use internal measures for comparing our approach. Those
series into a Markov chain (MC) and then using a Bayesian internal measures commonly rely on a similarity measure of
method to cluster the MCs [35], referring to them as episodes. the actual data which is being clustered. Thorough work on
Here the data needs to be discretized into bins of equal metric comparison and similarity measures has been done
length. Reference [36] uses manually selected characteristics [25], [51], [52]. Most of those measures are based on simple
(e.g., kurtosis, skewness and frequency) for clustering uni- distances and densities computed for each data point but
variate TSD. Others are using the Augmented Dickey-Fuller do not take time dependency into consideration. Because of
this, we found that for the use case described in this paper, steps using x from Equation (1) as
the commonly used clustering evaluation measures are not
X = (x0 , x1 , . . . , xn ) with n ∈ N (2)
well suited for ‘‘time-series clustering evaluation measures’’.
In section V we introduce an approach for similarity measures so, X ∈ Rd×n . With time dependency consideration, it is
which considers time dependency in combination with well- reasonable to denote a sliding window of the streaming data,
established clustering metrics (see Table 2). considering Equation (1) and n the number of samples already
collected as:
TABLE 2. Metrics used for time-series clustering comparison.
Wt = xt+0 , xt+1 , . . . , xt+ζ ⇒ (ζ ∈ N) ∧ (ζ < n) (3)
Implementations used from [15].
xt = (x0 , x1 , . . . , xd )T with x ∈ R and d ∈ N (1) For the output of a clustering algorithm at time t we denote
the scalar value yt as our label or designated subsequence
whereas the natural numbers include zero {0, 1, 2, . . .} = N. identification. For evaluation purposes a clustering for a time-
A complete measurement sequence with n number of time series produces a label array y for all time steps:
4 https://sites.cc.gatech.edu/~borg/ijcv_psslds/ y = (y0 , y1 , . . . , yn ) with y ∈ J and n ∈ N (11)
5 https://data.nasa.gov/dataset/C-MAPSS-Aircraft-Engine-Simulator-
Data/xaut-bemq Furthermore, it is a requirement, that the streaming data pro-
6 http://www.timeseriesclassification.com/description.php?Dataset=
vided can be applied to a numerical differentiation algorithm.
EigenWorms Therefore, a constant sample rate is necessary and in case
7 https://archive.ics.uci.edu/ml/datasets/Condition%20monitoring%20of
%20hydraulic%20systems of strong noise, filtering or smoothing of the data should
8 http://mocap.cs.cmu.edu/ be applied by a preprocessing step. Also, there mustn’t be
9 https://github.com/LuisM78/Occupancy-detection-data missing values and extreme outliers need to be removed.
IV. DATASETS
All datasets used for comparison in this work are described
briefly in this section and listed in Table 3. They all contain
quantitative features with continuous values. For further use
of the datasets, no missing values exist, the data is continuous
and was standardized for the algorithms but not for the metric
computations. No other preprocessing like smoothening or
filtering was performed.
The bee-waggle dataset [56] contains movement of bees
in a hive captured with a vision-based tracker. The first two
features are the x and y coordinates of the bee added with the
sine and the cosine function applied to the heading angle. FIGURE 2. Lorenz-attractor dataset. Computed with Ẋ = s(Y − X );
Ẏ = rX −Y − XZ ; Ż = XY −bZ and parameters used s = 10, r = 28 and
The cmapss dataset is a ‘‘dataset of run-to-failure trajecto- b = 2.667. Color and marker size indicate amount of curvature on a
ries for a small fleet of aircraft engines under realistic flight logarithmic scale for better visibility.
conditions’’ [57] with 18 features.
The eigen-worms dataset [58] contains measurements of
worm motion. Preprocessing extracted six features, which subsequences in MTSD in general (in regard to the restric-
represent the amplitudes along six previously identified base tions in section III).
shapes of the worms Our MT3SCM score consists of three main components.
The hydraulic dataset [59] is obtained from a hydraulic
test rig with measuring 17 process values such as pressures,
mt3scm = (ccw + sL + sP )/3 (12)
volume flows and temperatures.
Lorenz-attractor refers to a synthetic dataset which is
calculated using a system of the three coupled ordinary dif- The weighted curvature consistency (ccw ), the silhouette
ferential equations which represent a hydrodynamic system: location based (sL ) and the silhouette curve-parameter based
Ẋ = s(Y − X ); Ẏ = rX −Y − XZ ; Ż = XY −bZ with (sP ). When making the attempt of clustering TSD, it is sub-
parameters used s = 10, r = 28 and b = 2.667 (see Figure 2). jective and domain specific. Nevertheless, we try to take
‘‘In these equations X is proportional to the intensity of the the intuitive approach of treating MTSD as space curves
convective motion, while Y is proportional to the temperature and use the parameterization as a similarity measure. This
difference between the ascending and descending currents, is done in two different ways. First, we create new fea-
similar signs of X and Y denoting that warm fluid is rising, tures by computing the curve parameters sample by sample
and cold fluid is descending.’’ [60] (e.g., curvature, torsion, acceleration) and determine their
The mocap or The Motion Capture Database (MOCAP) standard deviation for each cluster. Our hypothesis is, that
dataset [61] contains 93 features from human motion cap- with a low standard deviation of the curve parameters inside
tured with markers. a cluster, the actions of a mechatronic system in this clus-
The occupancy dataset [62] is a measurement of sensory ter are similar. We call this the curvature consistency (cc)
data in an office with the following sensors: temperature, (see Equation (24) used in 14 in algorithm 1). The sec-
humidity, the derived humidity ratio, light and CO2. ond procedure is to apply these newly computed features,
The thomas-attractor dataset is as the lorenz-attractor which are computed to scalar values per subsequence, onto
dataset, a synthetic dataset, computed with the three coupled a well-established internal clustering metric, the silhouette
differential equations: Ẋ = sin(Y ) − bX ; Ẏ = sin(Z ) − score [53] (see Table 2).
bY ; Ż = sin(X ) − bZ originally proposed by [63] with the The computation of the cc comprises the calculation of the
parameter used b = 0.1615. curvature κ and the torsion τ at every time step t with xt .
subsequence in the original feature space (accomplished the standard metrics, results in best scores for all metrics. This
by sL ). The proposed algorithm for this new metrics compu- shows that the new feature space allows a good separation in
tation is described in algorithm 1. contrast to the original space, as proven by the metrics scores
for silhouette, calinski-harabasz and davies-bouldin on the
Algorithm 1 MT3SCM original and the two new feature spaces. To show the benefit
1: procedure MT3SCM(X , y) ▷ Data X ∈ Rd×n and labels of the new feature space, we applied the agglomerative clus-
y ∈ Nn tering10 not on the original lorenz-attractor dataset but on the
2: L ← empty() ▷ Array initialization for all newly computed feature space based on curvature, torsion and
subsequence median coordinates or Location acceleration (see Figure 5) The metric values for Figure 5 (b)
3: P ← empty() ▷ Array initialization for all show a high ccw and a decent sP value for the low number of
subsequence curve Parameters mean values 10 clusters specified.
4: K ← GetCurveParametersForAllData(X ) To further evaluate our metric, we used the lorenz-attractor
5: yunique ← FindUniqueClusterIDs(y) and the thomas-attractor dataset (see Table 3) and applied
6: for i in yunique do an agglomerative clustering, a time-series k-means clustering
7: Xi ← GetClusterData(X , i) as well as a random subsequence clustering. Varying the
8: s ← FindSubsequences(y, i) number of clusters and some algorithm specific parameters.
9: for j in s do Afterwards the metrics calinski-harabasz, davies-bouldin and
10: Xi,j ← GetSubsequenceData(Xi , j) silhouette scores were computed and compared to our new
11: L[i, j] ← GetMedianLocations(Xi,j ) metric MT3SCM. From these results we derived a correlation
12: P[i, j] ← GetCurveParameterValues(K , i, j) matrix (see Figure 6). The cc and the ccw are clearly related
13: end for due to their direct combination. The positive correlation
14: cci ← ClusterCurvatureConsistency(P) ▷ between the internal components to the overall MT3SCM
Compute the cluster curvature consistency (cci ) with the score is obvious. We see a clear positive correlation to the
empirical standard deviation of each curve parameter silhouette score which is evident due to the internal use of
over time. If the cluster consists only of one time step, this metric. Interestingly, the correlation between the ccw and
set the cci to zero. the sP is negative. This is due to the types of datasets and
15: C[i] ← cci ▷ Collect cci data for all clusters algorithms we used. Because with higher number of clusters
16: end for we theoretically expect a better cc because of the lower stan-
17: ccw ← WeightedAverage(C, npc ) ▷ dard deviation by chance. On the other hand, the more clusters
Compute weighted average curvature consistency (ccw ) exist, the more likely a similar curve parameter between the
from cci with number of points per cluster clusters exists and therefore creates a new feature space with
18: sL ← SilhouetteComputation(L, yunique ) ▷ Compute overlapping clusters, which results in a low sP score. This
the silhouette coefficient using the center positions of can be retraced within the subfigures of Figure 7. The low
each identified subsequence correlation between the calinski-harabasz and the davies-
19: sP ← SilhouetteComputation(P, yunique ) ▷ Compute bouldin scores supports our point that the available clustering
the silhouette coefficient with the curve parameters metrics are not well suited to be used for time-series cluster-
20: score ← (ccw + sL + sP )/3 ing evaluation measures. Figure 7 shows examples where the
21: return score ▷ The final score agglomerative clustering was applied on the lorenz-attractor
22: end procedure dataset (part of the data used for the correlation matrix
Figure 6). It can be seen that the agglomerative clustering on
the original dataset is not an optimal cluster algorithm, when
comparing the metrics to Figure 5 (b). Comparing Figure 5 (b)
A. EVALUATION and Figure 7 (d) we can see a similar MT3SCM score but very
For computational tests, we manually created a ‘‘perfect’’ different standard metrics scores. The similar MT3SCM score
synthetic dataset with respect to our metric (see Figure 4). is based on the much higher number of clusters and equally
Figure 4 (a) shows the original synthetic dataset, where the distributed subsequence length in Figure 7 (d), which results
subsequences in cluster 1 are a helix along the increasing in a high ccw value as well as a good spatial separation (sL ),
x axis. For cluster 2 the subsequences are a straight move- which is compensating the low sP value due to the similar
ment, with quadratic decreasing distances along the y axis. curve parameters of the clusters. Figure 5 (b) however, also
Cluster 3 is representing a helix along the decreasing x axis has a very high ccw value with a good sP value reaching
but with a different resolution than cluster 1. Cluster 4 is, a similar MT3SCM score but with a fifth of the number of
along with cluster 2, a straight movement with quadratic clusters. How our metric handles random clustering with
increasing distances along the y axis. This cycle is repeated
six times. Figure 4 (b) shows the new feature space for the sL
component. The feature space for the sP component is shown 10 https://scikit-learn.org/stable/modules/generated/sklearn.cluster.
in Figure 4 (c). Applying the new features per subsequence on AgglomerativeClustering.html
FIGURE 4. Synthetic dataset with four clusters with a perfect own metric score of mt 3scm = 1 due to each cluster’s unique and constant curve
parameters. (a) Synthetic dataset with best own result of mt 3scm = 1. Standard metrics scores computed with original data; davies-bouldin: 1.4,
calinski-harabasz: 6.9e + 02, silhouette: 0.087. (b) New feature space from the centers (median value) of each subsequence. Standard metrics scores
computed with new feature space; davies-bouldin: 6.3e − 07, calinski-harabasz: 3.8e + 14, silhouette: 1. (c) New feature space from the curve
parameters extracted from each subsequence. Standard metrics scores computed with new feature space; davies-bouldin: 6.8e − 07,
calinski-harabasz: 1.2e + 13, silhouette: 1.
critical scenarios, is shown in Figure 8. The Python code and clustering criteria. Common similarity functions used are
a more detailed evaluation are publicly available at [64] distance measures like euclidean distance or some kind of
correlation coefficients like Pearson’s correlation coefficient.
B. CONCLUSION Those are also used for static data clustering algorithms. More
We have described a more suitable similarity measure for suitable for time-series clustering are similarity functions
dependent TSD. After showing how to compute our metric like Dynamic Time Warping (DTW) distance, short time-
and evaluated on different datasets its use case and effec- series (STS) distance [65] or considering space curves like
tiveness. Further we will use this metric in addition to the we introduced in section V.
standard metrics to evaluate our proposed online time-series In this work we analyzed an approach which is data driven,
clustering algorithm which is described in section VI based on unsupervised machine learning algorithms and has
online capabilities (see Figure 11). Our approach uses a RNN
VI. CLUSTERING ALGORITHM (ABIMCA) based AE to generate scores which are used as similarity
In this section we describe the concept of our time-series measures. Specifically, the experiments in this work were
clustering approach in detail. Afterwards, we apply our algo- performed using a pytorch [66] implementation of a bidi-
rithm onto the datasets described in section IV and present rectional one-layer gated recurrent unit (GRU) RNN with a
the results. hidden size of the input dimensions minus one h = d − 1.
Other prerequisites regarding the dataset and preprocessing
A. METHOD are described in section IV and section III.
As described in [23] a key component in a time-series clus- The main procedure of the approach is as follows: The
tering algorithm is the similarity function to quantify the incoming data is taken as a sliding window Wt at the current
FIGURE 6. Own metric (MT3SCM) correlation analysis. Own metric and its
four subcomponents (curvature consistency (cc), weighted curvature
consistency (ccw ), silhouette location based (sL ), silhouette FIGURE 8. Own metric evaluation using random clusterer on
curve-parameter based (sP )) correlation to calinski-harabasz, thomas-attractor dataset and lorenz-attractor dataset. See Table 4 for
davies-bouldin and silhouette score for random, agglomerative and metric comparison of the following subplots (a) Own metric and all of its
k-means clustering on lorenz and thomas-attractor dataset. subcomponents are around zero, as desired. Calinski-harabasz value is
low and davies-bouldin is high, which also indicate a ‘‘bad’’ clustering
(b) Longer random subsequences also generate a MT3SCM result around
zero. Calinski-harabasz and davies-bouldin scores are stronger influenced
time t with length ζ of past time steps and number of fea- by the subsequence length (c) Own metric and all of its subcomponents
tures d. This matrix Wt ∈ Rd×ζ is used for the input of, what are around zero, as desired. Calinski-harabasz value is low and
davies-bouldin is high, which also indicate a ‘‘bad’’ clustering (d) As seen
we call, the Base Autoencoder (BAE). The key element of our for the lorenz-attractor data in (b), longer subsequences have a high
algorithm is, that this BAE’s parameters are not constant but impact on calinski-harabasz and davies-bouldin scores.
being adapted iteratively with a stochastic gradient descent
(SGD) optimization method for each new incoming sliding
regularization term or sparsity penalty
window. For this training of the BAE, we use a slight adaption
of the sparse AE loss function L from [67] with a basic loss = l = L(Wt , W
et , h) = MSE(Wt , W
et ) + (h) (30)
FIGURE 11. Concept of the ABIMCA approach. Sliding window of the MTS Wt is iteratively trained in the base AE. If score of base AE (gray dotted
line) is below threshold (dashed red line), a new subsequence AE is created from the base AE. Incoming data is also compared to existing
subsequence AEs if subsequence can be recognized.
TABLE 7. Best metric ‘metrics.mt3scm’ value for each dataset and algorithm from hyperparameter search results.
TABLE 8. Best metric ‘metrics.calinski-harabasz’ value for each dataset and algorithm from hyperparameter search results.
The bottom row of Figure 9 shows, that a subsequence Autoencoder (SAE), are used to recognize previously seen
is present, when the BAE’s score (blue line) is below the subsequences using the same scoring function. A concept
horizontal black line (sb <= η). If a subsequence is present, drawing of the approach is shown in Figure 11. The algorithm
a copy of the BAE is made and its parameters are frozen is described in pseudocode in algorithm 2.
and associated with this specific pattern of a subsequence. The functionality can be retraced considering
These copies of the BAE, which we call Subsequence Figure 9 and 10. This example shows the algorithm applied
TABLE 9. Best metric ‘metrics.davies-bouldin’ value for each dataset and algorithm from hyperparameter search results.
TABLE 10. Best metric ‘metrics.silhouette’ value for each dataset and algorithm from hyperparameter search results.
TABLE 11. Metric ‘metrics.mt3scm’ value for each dataset and algorithm from default calibration results.
TABLE 12. Metric ‘metrics.calinski-harabasz’ value for each dataset and algorithm from default calibration results.
on a three-dimensional synthetic data set. The input data Recognizing a previously identified subsequence, however,
consists of four different operation points with small white is almost instantaneous. The calibration of the thresholds
noise. The sequence of the four subsequences is repeated η (horizontal black line) and ζ (horizontal gray line) are
once. The other rows are described in the caption of Figure 9. apparently crucial. The necessary time steps to adapt to a
It is evident that the algorithm needs a few time steps to adapt current subsequence can be altered by the calibration of the
to the current subsequence until it is recognized as such. learning rate α and the number of BAE’s training cycles
TABLE 13. Metric ‘metrics.davies-bouldin’ value for each dataset and algorithm from default calibration results.
TABLE 14. Metric ‘metrics.silhouette’ value for each dataset and algorithm from default calibration results.
B. EVALUATION
For the evaluation study of our algorithm, we chose eight
different MTS datasets (see Table 3) from which six are pub-
licly available and two are provided with our codebase [68],
seven other state-of-the-art algorithms (see Table 1) and three
widely used unsupervised clustering metrics (see Table 2).
Each algorithm has been applied to each dataset with default
parameters. Additionally, we performed a hyperparameter
search for each algorithm based on a random grid search of
300 samples. The parameter boundaries for this hyperparam-
eter search are listed in Table 15. Overall, 19 264 experiments
were run.
For a better overview of the results, we chose to compare
every algorithm to the ‘‘MiniBatchKMeans’’ algorithm and
FIGURE 12. Empricial complexity estimation with variation of total counted the number of times they performed better. Table 5
number of datapoints and subsequences identified by our algorithm over
the duration. 11.
shows the results for the hyperparameter search and the
number of outperformances of each algorithm compared to
the ‘‘MiniBatchKMeans’’ algorithm. Table 6 shows the same
results with default parameter settings for each algorithm.
per time step ω. A faster recognition of a subsequence has
We can see, that in sum and in two of the metrics our algo-
the drawback of the algorithm being very sensitive and
rithm beats state-of-the-art algorithms. The full list of results
therefore identifying even small changes of the input as a new
is attached in Table 7 and 14. Additionally, Table 16 shows
subsequence. A strategy could be to calibrate the algorithm
the best results from the hyperparameter search when sorted
first to be rather insensitive and cluster the time-series in
by MT3SCM with the total time spent.11
major subsequences. These can then be further clustered
with a more sensitive calibration. This procedure can be 11 Experiments were performed on a Linux machine with a AMD Ryzen
repeated until the required degree of granularity is achieved. Threadripper 2950X 16-Core Processor using a GeForce RTX 2080 GPU.
Algorithm 2 ABIMCA. Using the Following Parameters: TABLE 15. Hyperparameter random grid search upper and lower bound
α: Learning Rate, ω: Number of Base Model Training Cycle for each algorithm and their specific parameter options.
TABLE 16. Metrics from hyperparameter search when sorted for best value of mt3scm metric.
identified, the algorithm checks if the new incoming data data after training is the parameters of the subsequence spe-
is already known by comparing it to the previously identi- cific AEs. It is a completely unsupervised method which can
fied subsequences. A linear correlation of the duration for cluster online data. In the context of CbM the once identified
every datapoint with the number of subsequences identified subsequence AEs can be used for deviation quantification
is therefore present. The complexity of our algorithm is of the underlying system. This can be used for deterioration
therefore O(NS), where N is the number of datapoints and analysis and maintenance strategies. Further investigations
S is the number of subsequences identified. Considering the for improving the ABIMCA method would be to explore
worst case scenario of identifying every new datapoint as different kind of AEs like feedforward neural network (FNN),
a new subsequence the duration for the algorithm increases CNN or a combination of such. Also, a VAE could be rea-
quadratically with the number of datapoints, which results in sonable depending on the underlying process. Future work
a complexity of O(n2 ). should analyze the effect of reducing the latent space dimen-
As cited before, every algorithm performs differently on sion by multiple factors of the input dimension (when input
the specific distribution of patterns and the hyperparameter dimension is very high). This could reduce computation costs
search was a simple random grid search of ‘‘only’’ 300 sam- and improve representation learning without performance
ples, so these results unlikely represent the optimal solution loss. A detailed analysis of the optimal default parameters or a
for each algorithm on each dataset. Nevertheless, we demon- generic automatic calibration depending on some statistics of
strate, that the algorithm we present in this work, is highly the expected input could increase performance and decrease
effective of detecting subsequences online in a MTS. calibration efforts.
[4] C. M. Bishop, Pattern Recognition and Machine Learning (Information [30] C. Isaksson, M. H. Dunham, and M. Hahsler, ‘‘SOStream: Self organizing
science and statistics), 1st ed. New York, NY, USA: Springer, 2016. density-based clustering over data stream,’’ in Machine Learning and
[5] J. F. Roddick and M. Spiliopoulou, ‘‘A survey of temporal knowledge dis- Data Mining in Pattern Recognition (Lecture Notes in Computer Science),
covery paradigms and methods,’’ IEEE Trans. Knowl. Data Eng., vol. 14, vol. 7376, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern,
no. 4, pp. 750–767, Jul. 2002. J. C. Mitchell, M. Naor, O. Nierstrasz, C. P. Rangan, B. Steffen, M. Sudan,
[6] J. Lin, E. Keogh, S. Lonardi, and P. Patel, ‘‘Finding motifs in time series,’’ D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, and P. Perner, Eds.
Proc. 2nd Workshop Temporal Data Mining, 2002, pp. 1–11. Berlin, Germany: Springer, 2012, pp. 264–278.
[7] G. J. J. van den Burg and C. K. I. Williams, ‘‘An evaluation of change point [31] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, ‘‘A density-based algo-
detection algorithms,’’ 2020, arXiv:2003.06222. rithm for discovering clusters a density-based algorithm for discovering
[8] A. K. Jain, M. N. Murty, and P. J. Flynn, ‘‘Data clustering: A review,’’ ACM clusters in large spatial databases with noise,’’ in Proc. 2nd Int. Conf.
Comput. Surv., vol. 31, no. 3, pp. 264–323, Sep. 1999. Knowl. Discovery Data Mining, 1996, pp. 226–231. [Online]. Available:
[9] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A survey,’’ http://dl.acm.org/citation.cfm?id=3001460.3001507
ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009. [32] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani,
[10] R. Agrawal and R. Srikant, ‘‘Mining sequential patterns,’’ in Proc. 11th ‘‘Streaming-data algorithms for high-quality clustering,’’ in Proc. 18th Int.
Int. Conf. Data Eng., Mar. 1995, pp. 3–14. Conf. Data Eng., 2002, pp. 685–694.
[11] D. Xu and Y. Tian, ‘‘A comprehensive survey of clustering algorithms,’’ [33] M. Ghesmoune, M. Lebbah, and H. Azzag, ‘‘State-of-the-art on clustering
Ann. Data Sci., vol. 2, no. 2, pp. 165–193, 2015. data streams,’’ Big Data Analytics, vol. 1, no. 1, Dec. 2016.
[12] M. Lovrić, M. Milanović, and M. Stamenković, ‘‘Algoritmic meth- [34] E. Keogh, S. Chu, D. Hart, and M. Pazzani, ‘‘Segmenting time series:
ods for segmentation of time series: An overview,’’ J. Contemp. A survey and novel approach,’’ in Data Mining in Time Series Databases,
Econ. Bus. Issues, vol. 1, no. 1, pp. 31–53, 2014. [Online]. Available: vol. 57, M. Last, A. Kandel, and H. Bunke, Eds. Singapore: World Scien-
http://hdl.handle.net/10419/147468 tific, 2003, pp. 1–21.
[13] S. Torkamani and V. Lohweg, ‘‘Survey on time series motif discovery,’’ [35] M. Ramoni, P. Sebastiani, and P. Cohen, ‘‘Bayesian clustering by dynam-
Wiley Interdiscipl. Rev., Data Mining Knowl. Discovery, vol. 7, no. 2, ics,’’ Mach. Learn., vol. 47, no. 1, pp. 91–121, 2002.
p. e1199, Mar. 2017. [36] X. Wang, K. Smith, and R. Hyndman, ‘‘Characteristic-based clustering
[14] S. Aminikhanghahi and D. J. Cook, ‘‘A survey of methods for time series for time series data,’’ Data Mining Knowl. Discovery, vol. 13, no. 3,
change point detection,’’ Knowl. Inf. Syst., vol. 51, no. 2, pp. 339–367, pp. 335–364, 2006.
2017. [37] R. P. Silva, B. B. Zarpelão, A. Cano, and S. B. Junior, ‘‘Time series segmen-
[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, tation based on stationarity analysis to improve new samples prediction,’’
M. Blondel, A. Müller, J. Nothman, G. Louppe, P. Prettenhofer, R. Weiss, Sensors, vol. 21, no. 21, p. 7333, Nov. 2021.
V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, [38] S. Lu and S. Huang, ‘‘Segmentation of multivariate industrial time series
M. Perrot, and E. Duchesnay, ‘‘Scikit-learn: Machine learning in Python,’’ data based on dynamic latent variable predictability,’’ IEEE Access, vol. 8,
J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011. pp. 112092–112103, 2020.
[16] T. Zhang, R. Ramakrishnan, and M. Livny, ‘‘BIRCH: An efficient data [39] M. Ceci, R. Corizzo, N. Japkowicz, P. Mignone, and G. Pio, ‘‘ECHAD:
clustering method for very large databases,’’ in Proc. ACM SIGMOD Int. Embedding-based change detection from multivariate time series in smart
Conf. Manage. Data, vol. 25, no. 2, 1996, pp. 103–114. grids,’’ IEEE Access, vol. 8, pp. 156053–156066, 2020.
[17] R. Prescott Adams and D. J. C. MacKay, ‘‘Bayesian online changepoint [40] R. Li, S. Li, K. Xu, X. Li, J. Lu, and M. Zeng, ‘‘A novel symmetric stacked
detection,’’ 2007, arXiv:0710.3742. autoencoder for adversarial domain adaptation under variable speed,’’
[18] J. Montiel, M. Halford, S. Martiello Mastelini, G. Bolmier, R. Sourty, IEEE Access, vol. 10, pp. 24678–24689, 2022.
R. Vaysse, A. Zouitine, H. Murilo Gomes, J. Read, T. Abdessalem, and [41] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, ‘‘Spatio-
A. Bifet, ‘‘River: Machine learning for streaming data in Python,’’ 2020, temporal convolutional sparse auto-encoder for sequence classification,’’
arXiv:2012.04740. in Proc. Brit. Mach. Vis. Conf., R. Bowden, J. Collomosse, and K. Miko-
[19] C. C. Aggarwal, P. S. Yu, J. Han, and J. Wang, ‘‘A framework for clustering lajczyk, Eds. 2012, p. 124.
evolving data streams,’’ in Proc. VLDB Conf. Elsevier, 2003, pp. 81–92. [42] K. Bascol, R. Emonet, E. Fromont, and J.-M. Odobez, ‘‘Unsupervised
[20] M. Hahsler and M. Bolaños, ‘‘Clustering data streams based on shared interpretable pattern discovery in time series using autoencoders,’’ in
density between micro-clusters,’’ IEEE Trans. Knowl. Data Eng., vol. 28, Structural, Syntactic, and Statistical Pattern Recognition (Lecture Notes
no. 6, pp. 1449–1461, Jun. 2016. in Computer Science), vol. 10029, A. Robles-Kelly, M. Loog, B. Biggio,
[21] F. Cao, M. Estert, W. Qian, and A. Zhou, ‘‘Density-based clustering over F. Escolano, and R. Wilson, Eds. Cham, Switzerland: Springer, 2016,
an evolving data stream with noise,’’ in Proc. SIAM Int. Conf. Data Mining, pp. 427–438.
J. Ghosh, D. Lambert, D. Skillicorn, and J. Srivastava, Eds. Philadel- [43] S. E. Chazan, S. Gannot, and J. Goldberger, ‘‘Deep clustering based on a
phia, PA, USA: Society for Industrial and Applied Mathematics, 2006, mixture of autoencoders,’’ in Proc. IEEE 29th Int. Workshop Mach. Learn.
pp. 328–339. Signal Process. (MLSP), Oct. 2019, pp. 1–6.
[22] D. Sculley, ‘‘Web-scale K-means clustering,’’ in Proc. 19th Int. Conf. [44] B. Yang, X. Fu, D. N. Sidiropoulos, and M. Hong, ‘‘Towards
World Wide Web (WWW), New York, NY, USA, M. Rappa, P. Jones, K-means-friendly spaces: Simultaneous deep learning and
J. Freire, and S. Chakrabarti, Eds. 2010, p. 1177. clustering,’’ in Proc. 34th Int. Conf. Mach. Learn., D. Precup and
[23] T. W. Liao, ‘‘Clustering of time series data—A survey,’’ Pattern Recognit., Y. W. Teh, Eds. vol. 70, 2017, pp. 3861–3870. [Online]. Available:
vol. 38, no. 11, pp. 1857–1874, 2005. https://proceedings.mlr.press/v70/yang17b.html
[24] S. Zolhavarieh, S. Aghabozorgi, and Y. W. Teh, ‘‘A review of subsequence [45] Y. Guo, W. Liao, Q. Wang, L. Yu, T. Ji, and P. Li, ‘‘Multidimensional
time series clustering,’’ Sci. World J., vol. 2014, Jul. 2014, Art. no. 312521. time series anomaly detection: A gru-based Gaussian mixture variational
[25] S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, ‘‘Time-series autoencoder approach,’’ in Proc. 10th Asian Conf. Mach. Learn., J. Zhu
clustering—A decade review,’’ Inf. Syst., vol. 53, pp. 16–38, Oct. 2015. and I. Takeuchi, Eds. vol. 95, 2018, pp. 97–112. [Online]. Available:
[26] M. Carnein and H. Trautmann, ‘‘Optimizing data stream representation: http://proceedings.mlr.press/v95/guo18a.html
An extensive survey on stream clustering algorithms,’’ Bus. Inf. Syst. Eng., [46] T. Chen, X. Liu, B. Xia, W. Wang, and Y. Lai, ‘‘Unsupervised
vol. 61, no. 3, pp. 277–297, Jun. 2019. anomaly detection of industrial robots using sliding-window convolu-
[27] H.-P. Kriegel, P. Kröger, and A. Zimek, ‘‘Clustering high-dimensional tional variational autoencoder,’’ IEEE Access, vol. 8, pp. 47072–47081,
data,’’ ACM Trans. Knowl. Discovery Data, vol. 3, no. 1, pp. 1–58, 2020.
Mar. 2009. [47] W.-H. Lee, J. Ortiz, B. Ko, and R. Lee, ‘‘Time series segmentation through
[28] C. Truong, L. Oudre, and N. Vayatis, ‘‘Selective review of offline automatic feature learning,’’ 2018, arXiv:1801.05394.
change point detection methods,’’ Signal Process., vol. 167, Feb. 2020, [48] D. Dua and C. Graff. (2017). UCI Machine Learning Repository. [Online].
Art. no. 107299. Available: http://archive.ics.uci.edu/ml
[29] E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel, and D. Cremers, [49] (Sep. 14, 2020). Data.gov. [Online]. Available: https://data.gov/
‘‘Clustering with deep learning: Taxonomy and new methods,’’ 2018, [50] C. J. van Rijsbergen, Information Retrieval. London, U.K.: Butterworths,
arXiv:1801.07648. 1979.
[51] H. Kremer, P. Kranen, T. Jansen, T. Seidl, A. Bifet, G. Holmes, and JONAS KÖHNE was born in Berlin, Germany,
B. Pfahringer, ‘‘An effective evaluation measure for clustering on evolving in 1983. He received the Dipl.-Ing. degree in
data streams,’’ in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov- energy and process engineering from the Technis-
ery Data Mining (KDD), New York, NY, USA, C. Apte, J. Ghosh, and che Universität Berlin, Germany, in 2014, where
P. Smyth, Eds. 2011, p. 868. he is currently pursuing the Ph.D. degree with the
[52] J. Serrá and J. L. Arcos, ‘‘An empirical evaluation of similarity measures Department of Energy and Automation Technol-
for time series classification,’’ Knowl.-Based Syst., vol. 67, pp. 305–314, ogy. From 2014 to 2015, he worked as a Func-
Sep. 2014.
tion Development Engineer and an Embedded
[53] P. J. Rousseeuw, ‘‘Silhouettes: A graphical aid to the interpretation and
Software Test Engineer in the automotive field
validation of cluster analysis,’’ J. Comput. Appl. Math., vol. 20, no. 1,
of electrical power steering with Bertrandt AG.
pp. 53–65, 1987.
[54] T. Caliński and J. Harabasz, ‘‘A dendrite method for cluster analysis,’’ From 2015 to 2019, he worked as a Function Developer of powertrain and
Commun. Stat., Theory Methods, vol. 3, no. 1, pp. 1–27, 1974. power engineering with the Department of Commercial Vehicle Electronics,
[55] D. L. Davies and D. W. Bouldin, ‘‘A cluster separation measure,’’ IEEE IAV GmbH. Since 2019, he has been a Research Assistant with the Depart-
Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227, ment of Energy and Automation Technology, Technische Universität Berlin.
Apr. 1979. He is also working as a Data Scientist with the Department Commercial
[56] S. M. Oh, J. M. Rehg, T. Balch, and F. Dellaert, ‘‘Learning and inferring Vehicle Electronics, IAV GmbH. His research interests include anomaly
motion patterns using parametric segmental switching linear dynamic and novelty detection, condition/predictive maintenance strategies of mecha-
systems,’’ Int. J. Comput. Vis., vol. 77, nos. 1–3, pp. 103–124, May 2008. tronic systems, analysis of multivariate time-series data using autoencoder,
[57] M. Arias Chao, C. Kulkarni, K. Goebel, and O. Fink, ‘‘Aircraft engine and fault discovery and identification.
Run-to-Failure dataset under real flight conditions for prognostics and
diagnostics,’’ Data, vol. 6, no. 1, p. 5, Jan. 2021.
[58] E. Yemini, T. Jucikas, L. J. Grundy, A. E. X. Brown, and W. R. Schafer,
‘‘A database of caenorhabditis elegans behavioral phenotypes,’’ Nature
Methods, vol. 10, no. 9, pp. 877–879, Sep. 2013. LARS HENNING was born in Nauen, Germany,
[59] T. Schneider, N. Helwig, and A. Schütze, ‘‘Automatic feature extraction in 1975. He received the Dipl.-Ing. degree in
and selection for classification of cyclical time series data,’’ Technisches energy and process engineering from the Technis-
Messen, vol. 84, no. 3, pp. 198–206, Mar. 2017. che Universität Berlin, Germany, in 2002, and the
[60] E. N. Lorenz, ‘‘Deterministic nonperiodic flow,’’ J. Atmos. Sci., vol. 20, Ph.D. degree in control engineering, in 2008. Since
no. 2, pp. 130–141, 1963. 2008, he has been working with the Department
[61] (Apr. 21, 2022). Carnegie Mellon University—CMU Graphics Lab— of Commercial Vehicle Electronics, IAV GmbH,
Motion Capture Library. [Online]. Available: http://mocap.cs.cmu.edu/ in the area of powertrain and power engineering.
[62] L. M. I. Candanedo and V. Feldheim, ‘‘Accurate occupancy detection of Currently, he is a Team Manager of the Software
an office room from light, temperature, humidity and Co2 measurements Development Team, with the focus on condi-
using statistical learning models,’’ Energy Buildings, vol. 112, pp. 28–39, tion/predictive maintenance strategies of mechatronic systems and machine-
Jan. 2016. learned algorithms for powertrain control.
[63] R. Thomas, ‘‘Deterministic chaos seen in terms of feedback circuits:
Analysis, synthesis, ‘labyrinth chaos,’’’ Int. J. Bifurcation Chaos, vol. 9,
no. 10, pp. 1889–1905, 1999.
[64] J. Köhne. (2022). MT3SCM: Multivariate Time Series Sub-
Sequence Clustering Metric. Python. [Online]. Available: CLEMENS GÜHMANN was born in Berlin,
https://github.com/Jokonu/mt3scm Germany, in 1962. He received the Dipl.-Ing.
[65] C. S. Möller-Levet, F. Klawonn, K.-H. Cho, and O. Wolkenhauer, ‘‘Fuzzy degree in electrical engineering from the Technis-
clustering of short time-series and unevenly distributed sampling points,’’ che Universität Berlin, Germany, in 1989, and the
in Advances in Intelligent Data Analysis V (Lecture Notes in Computer Sci- Ph.D. degree in pattern recognition and technical
ence), vol. 2810, G. Goos, J. Hartmanis, J. van Leeuwen, M. R. Berthold, diagnosis, in 1995. From 1989 to 1994, he was
H.-J. Lenz, E. Bradley, R. Kruse, and C. Borgelt, Eds. Berlin, Germany: a Research Assistant with the Institute for Gen-
Springer, 2003, pp. 330–340. eral Electrical Engineering. From 1994 to 1995,
[66] A. Paszke et al., ‘‘PyTorch: An imperative style, high-performance deep he worked as a Function Development Engi-
learning library,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 32,
neer with Whirlpool Corporation (Bauknecht).
H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox,
From 1996 to 2003, he was employed as a Function Development Engi-
and R. Garnett, Eds. Red Hook, NY, USA: Curran Associates, 2019,
pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015- neer with IAV GmbH, where he was the last Head of the Department of
pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Transmission Systems. In 2001, he received a Teaching Assignment with
[67] M. Ranzato, C. S. Poultney, S. Chopra, and Y. LeCun, ‘‘Efficient learning the Technische Universität Berlin, where he was appointed to a professor-
of sparse representations with an energy-based model,’’ in Proc. NIPS, ship at the Chair of Electronic Measurement and Diagnostic Technology,
2006, pp. 1–8. in 2003. His research interests include the measurement technology and data
[68] J. Köhne. (2022). ABIMCA: Autoencoder Based Iterative Modeling processing and the diagnosis, predictive maintenance, modeling, simulation,
and Subsequence Clustering Algorithm. Python. [Online]. Available: and automatic control of mechatronic systems.
https://github.com/Jokonu/abimca