0% found this document useful (0 votes)
1 views

Meta-learning-based Approach for IoT Data Analytics

This paper presents a meta-learning-based approach for handling missing data in Internet of Things (IoT) data analytics, specifically for human activity classification. The proposed method utilizes a two-stage process involving an ensemble of heterogeneous classifiers followed by a meta-learner to effectively manage high levels of data sparsity without imputation. Experimental results demonstrate the approach's robustness, outperforming state-of-the-art models in scenarios with extreme missing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Meta-learning-based Approach for IoT Data Analytics

This paper presents a meta-learning-based approach for handling missing data in Internet of Things (IoT) data analytics, specifically for human activity classification. The proposed method utilizes a two-stage process involving an ensemble of heterogeneous classifiers followed by a meta-learner to effectively manage high levels of data sparsity without imputation. Experimental results demonstrate the approach's robustness, outperforming state-of-the-art models in scenarios with extreme missing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Sådhanå (2025)50:52  Indian Academy of Sciences

https://doi.org/10.1007/s12046-025-02713-1
Sadhana(0123456789().,-volV)FT3](012345
6789().,-volV)

Meta-learning-based approach for IoT data analytics


SAIRAM UTUKURU1,2,* and P RADHA KRISHNA2
1
Department of Information Technology, Chaitanya Bharathi Institute of Technology, Hyderabad, Telangana,
India
2
Department of Computer Science and Engineering, National Institute of Technology (NIT) Warangal,
Hanamkonda, Telangana, India
e-mail: usairam_it@cbit.ac.in; prkrishna@nitw.ac.in

MS received 17 August 2024; revised 2 November 2024; accepted 6 November 2024

Abstract. Missing data significantly impacts Internet of Things (IoT) applications, causing various issues
depending on the type and amount of the missing information. For example, in wearable device applications,
missing acceleration readings can result in misclassification of physical activities. Handling missing data in IoT
applications, particularly for tasks such as human activity classification, is a complex challenge. In this paper, we
propose a two-stage approach for IoT data classification that handles missing data effectively. In the first stage,
we build an ensemble of heterogeneous classifiers, while in the second stage, a meta-learner is employed to
address high levels of data sparsity without resorting to data imputation or reconstruction methods. The meta-
learner is trained on a dataset that combines the original features with the predictions from the first-stage
classifiers. By leveraging both feature analysis and classifier performance, the meta-learner produces accurate
predictions, ensuring robust results even in the presence of missing data. We address three mechanisms for
handling incomplete data: (i) missing at random, (ii) missing completely at random, and (iii) missing not at
random, each with various levels of missing data. Our results demonstrate the viability of the proposed approach
in handling extreme levels of missing data in both training and testing datasets, consistently outperforming state-
of-the-art models.

Keywords. Ensemble classifier; heterogeneous classifiers; meta-learner; incomplete data.

1. Introduction completely at random (MCAR), and (iii) missing not at


random (MNAR) [3]. The MAR mechanism occurs when
Despite major advances in data collection, incompleteness the probability of missing data is unrelated to the missing
remains a critical issue in data analysis. Missing feature values themselves but is instead related to other observed
problems are a persistent challenge in machine learning variables in the dataset. The MCAR mechanism occurs
(ML) and data science [1]. To mitigate the impact of when the probability of missing data is not related to any
missing data in the activity classification of Internet of other variables in the dataset, including the missing values
Things (IoT) applications, it is essential to implement themselves. In other words, missingness is completely
robust data management in place. This can include mea- random. The MNAR mechanism occurs when the proba-
sures such as data validation, imputation, and backup to bility of missing data is directly related to the missing
ensure proper handling of missing data. Often, multiple values themselves. In cases of MNAR, recovering the
features may contain null or missing values, leading to missing data becomes more challenging, as the relation-
biased predictions by classifiers [2]. In IoT data, it is ships between the missing values and the observed data
common for different variables or observations to exhibit may not be effectively captured by base classifiers, leading
varying patterns of incompleteness data. This is because to potential biases in analysis and prediction.
IoT data is inherently complex and/or diverse, with dif- In the literature, there are ML techniques that address
ferent sensors or IoT devices producing distinct types of missing values through imputation before training the
data, each with varying degrees of reliability and model [4, 5]. However, imputation can be challenging and
completeness. may introduce bias or noise into the data. Batra et al [6]
There are three types of mechanisms that exhibit used ensemble classifiers to handle missing values. The
incomplete data: (i) missing at random (MAR), (ii) missing ensembles can be categorized as data dependent or data
independent. Data-dependent ensemble methods rely
heavily on the specific structure and characteristics of the
*For correspondence
52 Page 2 of 9 Sådhanå (2025)50:52

dataset, while data-independent methods use statistical (i) Three different combinations of handling missing
measures or techniques that are not tightly coupled to the values in both training and test datasets are
dataset’s unique features. Ensembles are mostly based on explored.
the structure of the predictors, the pattern of missing data, (ii) A meta-learning approach is presented for effec-
and the type of relationship between dependent and inde- tively combining base models in a heterogeneous
pendent variables. ensemble classifier, leveraging their individual
An ensemble can also be categorized based on the strengths and weaknesses.
underlying ML models used. The models within an (iii) A robust meta-learner is developed to support
ensemble can be the same (homogeneous) or different extreme cases of missing data.
(heterogeneous). In homogeneous ensemble techniques, the
The rest of the paper is organized as follows. Section 2
models tend to be stable because they are less sensitive to
presents the related work. Section 3 presents the proposed
minor changes in the training data. Moreover, these models
are easier to interpret, as the base models share similar approach. Section 4 discusses the experimental results.
structures and typically identify comparable patterns and Finally, section 5 concludes the paper.
relationships. However, if all the base models are flawed,
the ensemble may not offer significant improvement over
the individual models. On the other hand, a heterogeneous 2. Related work
ensemble can result in a more robust final model. This is
because the diverse models capture various aspects of the Different approaches to handling missing data have been
problem and complement each other’s weaknesses. Each extensively studied in the literature [9–11]. Generating
classifier contributes unique insights and perspectives, diversity between classifiers can be achieved by training
resulting in an aggregate outcome that is often more them individually with different parameters, training data-
accurate than any single classifier. sets, and subsets of feature sets. Dung et al [12] presented a
In an ensemble, each classifier’s prediction can be treated random subspace method to train classifiers on distinct sub-
as a vote for a particular class. The final outcome is typi- domains of the feature set. However, this method is effec-
cally determined by different voting strategies. However, tive only when the dataset contains redundant information
when handling missing data, the majority voting principle across all features.
has the following limitations [7]: Ensemble methods have also been proposed to address
the challenges of missing data. Tasci et al [13] demon-
• It determines the result solely based on the class with strated that ensembles combining decision trees and neural
the majority votes, without considering class networks achieve higher accuracy than individual classi-
importance. fiers when applied to high-dimensional datasets. Hasan et al
• A subset of classifiers may agree on an incorrect [14] proposed an ensemble of decision tree classifiers,
classification, leading to misclassification. where each decision tree is trained on data that has been
• When misclassification costs are unequal, relying on imputed using different techniques, such as nearest neigh-
votes from a standalone model does not yield an bor, single imputation, and Bayesian multiple imputation.
optimal prediction. Banjarnahor et al [15] compared the results of an ensemble
In this work, we propose a meta-learning approach for IoT classifier based on the C4.5 decision tree and k-nearest
data analysis in the presence of missing values. Meta- neighbors (kNN), using different imputations with the
learning is a process of learning to learn, which helps expectation maximization multiple imputation (EMMI)
determine the best way of combining classifiers. The meta- technique. All these approaches showed that the ensemble
results achieved improved accuracy.
learner combines the predictions of different heterogeneous
Tran et al [16] employed multiple imputations with
models and learns optimally, weighing them to produce a
random subspace methods within an ensemble classifier,
more accurate final prediction. Figure 1 illustrates a high-
demonstrating effective handling of missing data up to
level overview of constructing a meta-learner. The training
30%. Aleryani et al [17] explored the use of ensemble
dataset (D) is the input, followed by the induction of MAR, learning combined with multiple imputations to build
MCAR, and MNAR patterns of missing data Dm . Hetero- diverse classifiers specifically for predicting missing data
geneous classifiers are then applied to Dm to obtain pre- and compared this approach against the hot deck and kNN-
dictions. These predictions, combined with Dm , are used to based single imputation methods.
train the meta-learner, which acts as the final classification Melville et al [18] proposed an ensemble method called
model. This approach is particularly effective when inde- Diverse Ensemble Creation by Oppositional Relabeling of
pendent models misclassify similar patterns [8]. Artificial Training Examples (DECORATE), which con-
The main contributions of this work are as follows: structs diverse committees by generating artificial training
examples. This approach effectively handles missing values
by creating complete datasets. Polikar et al [19] proposed
Sådhanå (2025)50:52 Page 3 of 9 52

Figure 1. Overview of the proposed approach.

an incremental learning algorithm, Learn??.MF, which 3.1 Overview of meta-learning


generates classifiers trained on random subsets of features.
Instances with more than one missing attribute are classi- Meta-learning is a technique that leverages the relationship
fied using majority voting by individual classifiers that did between learning algorithms. For a new task, meta-learning
not encounter the missing feature during training. facilitates the comparison and evaluation of learning
For minor data loss, ensemble techniques combined with models, identifying their strengths and weaknesses and then
surrogate values or single imputations can be effective. For recommending a suitable model or a combination of
moderate to large missing data, ensembles of multiple models that maximizes a utility function for the task. The
imputations with conditional inference offer better perfor- usefulness of a model is determined by mapping the task
mance [20]. In our previous work [21], we developed an description to model performance [22]. In general, meta-
ensemble classifier using a decision tree model on non- learning suggests a smaller subset of models compared with
overlapping subsets of feature sets, which outperformed the the number of available models. For instance, if there are A
random forest (RF) classifier when handling up to 50% number of models, meta-learning suggests B number of
missing data. models, where B is significantly smaller than A (B\\A).
Advanced techniques, such as deep learning, have shown Meta-learning requires metadata for training, which refers
improvements in classification. However, deep neural net- to data about the performance of base-level (first-level)
works may not enhance accuracy in some cases, especially learning models. It addresses model selection based on
with small datasets or those prone to overfitting. This is dataset characteristics or meta-features, which provide
because deep learning models adjust weights iteratively in insights into model performances. Dataset characterization
each epoch during training, which can amplify the influence involves extracting statistical parameters from the training
of outliers in the data. set, including meta-features such as the total number of
State-of-the-art research shows that various ensemble classes and/or features, the ratio of observations to features,
techniques handle missing data by building generalizable and the degree of correlation between features and the
classifiers. Most of these methods initially use imputation target variable. Meta-learning is used to select the best
to fill in missing values. These classifiers are then ensem- algorithm among base learners.
bled to handle moderate levels of missing data (up to 50%). Figure 2 illustrates the expertise space and landmarking
However, many studies do not address performance as map. The labeled region within the expertise space indi-
missingness increases or account for different mechanisms cates the landmark classifiers C1 , C2 ....C5 . For handling
for handling incomplete data. missing data, landmarking may reveal that specific classi-
In this paper, we propose a diverse ensemble classifier fiers, like C2 and C4 , are more effective because they can
based on heterogeneous models. We train a meta-learner, handle missing values by ignoring or managing the features
with the meta-knowledge captured by exploiting the most with missing data based on underlying similarities in their
promising base classifier prediction capabilities. mechanism. Other classifiers such as C1 and C5 may handle
this by ignoring observations. Very few classifiers adopt the
approach of ignoring both the features with missing data
and the observations themselves. Model grouping is con-
3. Proposed methodology
sidered a form of meta-learning because it extracts major
information about base-level learning based on the char-
This section starts by briefly discussing the meta-learning
acteristics of learning models. The main motivation for
component and later discusses the proposed approach in a
combining models is to reduce the possibility of misclas-
detailed manner.
sifying observations by enhancing model expertise.
52 Page 4 of 9 Sådhanå (2025)50:52

Figure 2. Expertise space of learners.

To handle missing values, we aim to use landmark In this work, we considered RF as a meta-learner as it
classifiers (simple models that serve as benchmarks) effi- constructs multiple trees that are less correlated with each
ciently based on the correlations between their expertise. other, sampling both observations and features. This helps
When selecting landmark classifiers, we evaluate all can- maintain a high level of diversity in the ensemble. RF is a
didates to identify which model performs best under the bagging ensemble technique that mainly focuses on
current conditions. If any of these classifiers demonstrate reducing variance, which is crucial when dealing with
computational efficiency, that classifier can serve as a datasets that may be sparse or contain noise.
landmarker. The overall model efficiency will be optimized Training principle of meta-learner The training
if one of these efficient classifiers is selected at the next metadata can be represented as \f ðxÞ; tðxÞ [ , where f(x) is
level of classification. the feature vector that captures important characteristics of
We have chosen different classifiers such as support the input data x and t(x) is the target value indicating which
vector machine (SVM), logistic regression (LR), kNN, and model from a set of models M performed best on x. The
RF along with a meta-learner to address the issue of spar- performance of each model is assessed using a measure like
sity in a dataset. The meta-learner can be any ML model, accuracy, with the target determined:
but we typically select a simpler model to ensure quicker
training and a lower likelihood of overfitting. tðxÞ ¼ arg max pðm; xÞ: ð1Þ
m2M
The following considerations guide the selection of
individual classifiers: This equation signifies the model that achieves the highest
• SVM: It handles missing values by ignoring the performance on input x. The set of models M addresses
features with missing data in its calculations for each classification problems by extracting statistical measures
data point. It uses the available non-missing features to such as the number of features, the number of classes, and
train the model and make predictions. This approach feature–target correlations. A variation of this approach
works well when missingness is completely random. uses indirect data characterization to explain algorithm
• LR: By default, LR excludes any rows with missing performance. In contrast, landmarking evaluates learners
values from the analysis; that is, incomplete observa- based on their task performance, emphasizing their exper-
tions are omitted. This can lead to a loss of valuable tise as a primary source of information rather than relying
data if many observations have missing values. solely on statistical properties [23]. Each classifier offers a
• kNN: It identifies the nearest neighbors based on a unique method for managing missing data, enabling
distance matrix. Missing values are then replaced with selection based on the characteristics of the data and the
weighted averages of corresponding attributes from nature of missingness.
these neighbors, preserving the data’s correlational The key difference between our approach and stacking is
structure. However, the default method often ignores as follows:
observations with missing values when calculating • Stacking splits the data into training and validation
distances. sets. The base models are trained on the training set to
• RF: It can handle missing data natively by ignoring the make estimates on the validation set. The predictions
missing features, and also, it is a bagging ensemble from the base models are then used as input to train the
technique that mainly focuses on reducing the meta-learner on the validation set.
variance.
Sådhanå (2025)50:52 Page 5 of 9 52

• However, in our approach, the meta-learner is trained procedure for invoking the heterogeneous classifiers and
on the incomplete training data alongside the predic- the meta-learner.
tions from the base models, and this combined model Stage I: invocation of heterogeneous classifiers
is applied directly to the test data. Step 1: A training dataset D without missing values is
used. Missing data is then induced using the three mecha-
nisms, MAR, MCAR, and MNAR, with varying percent-
Algorithm 1. Generating a Meta-learner
ages of missing values (ranging from 10 to 95%) in D,
denoted as Dm (line 1 in Algorithm 1).
Suppose we have N training examples, each with M
features having missing values (Dm ) and a target value y,
represented by training data (Dm , y), where Dm is an N  M
feature matrix and y is an N  1 vector of target values.
Step 2: Using expertise space, select the best-performing
classifiers from the various individual classifiers applied to
Dm , such as SVM, LR, kNN, and RF. These heterogeneous
models inherently handle missing values in different ways,
either by ignoring them or by incorporating them into their
training processes (lines 2–3 in algorithm 1).
Step 3: Each classifier makes predictions on Dm . The
predictions from these k models are denoted as
f1 ðDm Þ; f2 ðDm Þ; . . .; fk ðDm Þ. Each fi ðDm Þ is an N  1 vector
of predictions (P) for the target value, which are repre-
sented as follows: f1 ðDm Þ for SVM, f2 ðDm Þ for LR, f3 ðDm Þ
for kNN, and f4 ðDm Þ for RF.
Since each classifier brings unique predictive capabili-
ties, this ensemble approach offers more value than simply
averaging the base models’ predictions. The diversity and
expertise of the classifiers provide the meta-learner with
richer information, leading to better handling of incomplete
data (lines 4–6 in Algorithm 1).
3.2 Proposed meta-learning-based approach Step 4: A new dataset D0 is constructed, including pre-
Figure 3 illustrates a specific instance of a meta-learning dictions from each classifier and the corresponding per-
framework (see Figure 1), which consists of four best- centage of missing data (line 7 in Algorithm 1).
performing heterogeneous classifiers from the expertise Stage II: invocation of meta-learner
space: SVM, LR, kNN, and RF. Below is a step-by-step Step 5: Train the meta-learner on D0 . RF is used as the
meta-learner, taking the predictions from the heterogeneous

Figure 3. Handling of missing data using meta-learner.


52 Page 6 of 9 Sådhanå (2025)50:52

models and the original features as input. By providing all Table 2. Number of samples for each activity in the HAR dataset
the predictions along with the data that includes varying
percentages of missing values, the meta-learner becomes S.no. Activity name No. of samples (%)
more robust, as it learns to associate input features with 1 Walking 19.14
model predictions (line 8 in Algorithm 1). 2 Walking Upstairs 18.69
Step 6: The trained meta-learner serves as an ensemble 3 Walking Downstairs 17.49
classifier designed to handle and classify data with missing 4 Sitting 16.68
values. By training the base models and the meta-learner on 5 Standing 14.59
varying degrees of missing data, the overall classification 6 Laying 13.41
performance improves (line 9 in Algorithm 1).

4. Experimental results chest, with data sampled at 52 Hz. Activity-wise number of


samples (in percentages) is shown in table 3. Each partic-
In this work, we focus on handling missing data in both ipant’s data is stored in separate files, with sequential data
training and testing datasets (see table 1), reflecting the points for X-, Y-, and Z-axis acceleration and numerical
common occurrence of missing values in real-world sce- labels for the corresponding activity. This dataset is used to
narios. As shown in table 1, these missing values can be identify and authenticate individuals based on their unique
present in the training dataset, the testing dataset, or both. motion patterns.
We evaluate our approach involving three types of mech-
anisms with varying percentages of missing data.
4.2 Evaluation
In real-world scenarios, missing values are common in both
4.1 Dataset description training and testing datasets. Many existing methods rely
In this work, we experimented with two datasets: (i) human on imputation techniques to handle these gaps. However,
activity recognition (HAR) [24], which is a balanced our method avoids imputation altogether. We demonstrate
dataset, and (ii) activity recognition from single chest- that learning directly with missing data enhances the clas-
mounted accelerometer (ARSCM) [25], which is an sifier’s ability to handle incomplete data during activity
imbalanced dataset. Table 2 shows the details of the HAR classification, particularly when missing values are present
dataset about the activities and the number of samples (in in both the training and testing datasets. Missing values are
percentage) for each activity. The dataset includes three generated using the produce_NA function, as implemented
major feature types: accelerometer, gyroscope, and other by Mayer et al [26], across various types and percentages of
features. The accelerometer contributes the most features incomplete data. This function introduces missing values
(345), followed by the gyroscope (213), and other features according to three specified missing data mechanisms,
(3). The whole dataset is split into 70% training data and namely MAR, MCAR, and MNAR. It returns a data matrix
30% of test data. Data was collected at 50 samples per containing both newly generated and existing missing
second using the phone’s accelerometer and gyroscope. To values, along with an indicator matrix that marks the
reduce noise, a median filter and a third-order low-pass locations of the new missing values with binary entries (1
Butterworth filter with a 20-Hz cutoff frequency were for new and 0 otherwise). By learning with a certain per-
applied. centage of missing values and in a specific pattern, the
ARSCM is an imbalanced dataset consisting of model becomes more robust and adapts to the behavior of
accelerometer data collected from 15 participants per- missing data from the training dataset.
forming 7 different activities, such as working at a com- We applied the proposed method across different
puter, standing, walking, and talking. The data was mechanisms for handling incomplete data, testing it from
captured using a wearable accelerometer attached to the lower to higher percentages of missing values in both the
training and testing phases. All base classifiers are trained
and tested with missing values, specifically under the
Table 1. Possibility of missing values MCAR mechanism. The comparison of the proposed
method with RF and expectation maximization (EM) for
Training dataset Testing dataset handling a higher percentage of missing data (70%–95%) is
Without missing values Without missing values presented in table 4 for the MCAR pattern on the HAR
With missing values Without missing values dataset. The results in boldface indicate that as the per-
Without missing values With missing values centage of missing data increases, the performance of the
With missing values With missing values proposed method improves compared with both RF and
EM. This suggests that our approach is more robust in
Sådhanå (2025)50:52 Page 7 of 9 52

Table 3. Number of samples for each activity in the ARSCM dataset

S. no. Activity name No. of samples (%)


1 Working at Computer 31.68
2 Standing Up, Walking, and Going Up/Downstairs 2.50
3 Standing 11.20
4 Walking 18.62
5 Going Up/Downstairs 2.79
6 Walking and Talking with Someone 2.38
7 Talking While Standing 30.83

Table 4. Accuracy comparison between RF, meta-learner, and EM with RF for MCAR pattern on the HAR dataset

Test(%)

70% 80% 90% 95%

Trained RF as Meta- EM RF as Meta- EM RF as Meta- EM RF as Meta- EM


(%) RF learner with RF RF learner with RF RF learner with RF RF learner with RF
10 89.35 89.62 74.38 84.76 83.17 34.68 74.11 71.26 30.98 61.55 64.23 51.92
20 88.90 90.13 72.21 84.87 83.00 28.03 77.71 69.87 22.43 62.54 56.94 44.93
30 88.94 89.92 69.09 84.87 85.71 22.80 78.55 74.45 28.91 56.77 64.71 39.57
40 88.43 90.23 66.17 84.76 86.73 22.29 75.81 69.26 23.79 64.47 66.37 32.37
50 88.23 89.99 60.50 85.85 86.80 20.33 75.16 74.96 20.36 61.66 65.49 26.91
60 87.28 89.35 53.48 84.53 86.22 19.51 77.03 76.89 21.14 61.38 67.05 26.47
70 84.93 87.58 49.95 84.26 86.49 19.55 75.16 77.47 22.46 62.81 67.39 25.86
80 80.29 79.54 44.69 80.83 83.71 20.43 77.16 78.18 22.53 63.35 68.07 26.03
90 63.73 39.60 39.26 68.17 56.57 23.24 70.00 72.96 18.63 64.88 67.87 31.05
95 47.64 24.64 25.89 51.99 27.32 24.64 56.70 55.75 18.15 59.14 60.43 30.98

dealing with high levels of missing data. We observed that for 22 epochs with a batch size of 256, and validation
when missing data is present in both training and test performance is monitored using the test dataset. Missing
datasets, our method outperforms the scenario where data is generated under the MCAR pattern, and predictions
missing data is only present during the test phase. are made by selecting the highest probability from the
To demonstrate our model, we compared it with a neural Softmax output. Table 5 presents a comparison of the
network. For this, we used Keras, a high-level API for proposed approach against a neural network, in which
building and testing feedforward neural networks on clas- better results are shown in boldface for the ARSCM data-
sification tasks. The process begins with data preprocess- set. The meta-learner outperforms the neural network,
ing, where (a) the input features are standardized using the especially as the percentage of missing data increases, even
standard scaler to normalize the data distribution; (b) cate- in the case of an imbalanced dataset. This superiority is
gorical labels are encoded using a label encoder, converting particularly evident when dealing with high levels of
them into numerical representation; (c) one-hot encoding is missing data, as neural networks tend to struggle when a
applied to the target variable, transforming it into a binary large portion of data is absent. On the other hand, our meta-
format suitable for multi-class classification tasks; and learner handles missing data more effectively due to the
(d) principal component analysis is used to reduce the diversity introduced by the base learners, which improves
dimensionality of the feature space while retaining the robustness and classification performance.
maximum variance in the data. The neural network archi-
tecture consists of three hidden layers, with 64, 128, and 64
units, respectively, each employing a rectified linear unit 5. Conclusion
(ReLU) activation function. The output layer uses Softmax
activation for multi-class classification. The model is In IoT deployments, missing data is often inevitable due to
trained using the Adam optimizer, and categorical cross- environmental factors or communication disruptions.
entropy is used as the loss function. The model is trained Human activity classification becomes particularly
52 Page 8 of 9 Sådhanå (2025)50:52

Table 5. Accuracy comparison between neural network and meta-learner for MCAR pattern on the ARSCM dataset

Test (%)

70% 80% 90% 95%

Trained (%) RF as Meta-learner NN RF as Meta-learner NN RF as Meta Learner NN RF as Meta-learner NN


10 56.21 38.69 54.33 35.73 52.77 33.01 50.99 31.87
20 56.01 40.13 53.97 36.54 51.19 33.34 49.86 32.28
30 55.13 40.46 52.09 37.11 50.73 33.99 48.12 32.55
40 54.14 40.71 51.65 37.10 49.63 34.05 47.33 32.54
50 53.46 40.62 50.76 37.71 48.10 34.44 46.85 32.90
60 52.19 39.22 50.10 37.12 47.66 34.34 45.11 32.85
70 51.02 39.63 49.86 37.36 46.27 34.24 44.01 32.87
80 50.78 39.39 48.22 36.63 45.88 34.06 43.19 32.80
90 49.30 37.79 46.98 35.91 43.99 33.75 41.87 32.69
95 42.91 36.93 41.87 34.80 40.01 33.27 39.76 32.22

challenging in scenarios with missing data. In this work, we References


addressed the issue of human activity classification without
any imputation techniques. Our method employs a two- [1] Ilić Ivana D, Višnjić Jelena M, Randjelović Branislav M and
level classifier. At the first level, a set of heterogeneous Mitić Vojislav M 2021 Missing data samples: systematiza-
classifiers, including SVM, LR, kNN, and RF is used. tion and conducting methods-a review. Facta Universitatis,
These classifiers inherently handle missing data by ignoring Series, Math. Inf, pp 191–204
features or observations based on their algorithmic struc- [2] Sarker Iqbal H, Irshad Khan Asif, Abushark Yoosef B and
Fawaz Alsolami 2023 Internet of things (IoT) security
tures. At the second level, RF is used as a meta-learner. The
intelligence: a comprehensive overview, machine learning
meta-learner is trained on the predictions of first-level
solutions, and research directions. Mobile Netw. Appl. 28(1):
classifiers. These predictions, combined with specific per- 296–312
centages of missing data, help the meta-learner adapt to [3] Little R, Carpenter J and Lee K 2024 A comparison of three
incomplete data scenarios, making it more robust. Our popular methods for handling missing data: complete-case
approach leverages the diversity of the base classifiers and analysis, inverse probability weighting, and multiple impu-
their individual strategies for dealing with missing values. tation. Sociol. Method. Res. 53(3): 1105–1135
By training the meta-learner on incomplete data, the model [4] Tsai C and Hu Y 2022 Empirical comparison of supervised
is reinforced to handle missing data without the need for learning techniques for missing value imputation. Knowl. Inf.
imputation. Our experiments consider the MCAR mecha- Sys. 64(4): 1047–1075
nism, testing with low (10 to 30%), moderate (40 to 60%), [5] Lin W, Tsai C and Zhong J R 2022 Deep learning for
missing value imputation of continuous data and the effect of
and high (70 to 95%) percentages of missing values. The
data discretization. Knowl.-Based Syst. 5(239): 108079
experimental results demonstrate that the proposed method
[6] Batra S, Khurana R, Khan M Z, Boulila W, Koubaa A and
consistently improves HAR performance by training and Srivastava P 2022 A pragmatic ensemble strategy for
testing on incomplete data. The heterogeneous ensemble missing values imputation in health records. Entropy 24(4):
mechanism, coupled with the meta-learner’s ability to learn 533
from predictions of base classifiers, allows for more accu- [7] Aziz D and Sztahó D 2024 Automatic cross-and multi-
rate human activity classification. This is particularly ben- lingual recognition of dysphonia by ensemble classification
eficial for IoT deployments where data loss is common and using deep speaker embedding models. Expert Systems
imputation may not always be feasible. e13660
[8] Liu Q and Hauswirth M 2020 A Provenance Meta Learning
Framework for Missing Data Handling Methods Selection
2020 11th IEEE Annual Ubiquitous Computing, Electronics
Acknowledgements
and Mobile Communication Conference (UEMCON):
0349–0358
The authors are thankful to Kamalakar Karlapalem of IIIT [9] Miao X, Wu Y, Chen L, Gao Y and Yin J 2022 An
Hyderabad for his insightful comments, discussions, and experimental survey of missing data imputation algorithms.
valuable contributions to the testing of the classifier, which IEEE Trans. Knowl. Data Eng. 35(7): 6630–6650
significantly enhanced the presentation of this paper. [10] Sun Y, Li J, Xu Y, Zhang T and Wang X 2023 Deep learning
versus conventional methods for missing data imputation: a
Sådhanå (2025)50:52 Page 9 of 9 52

review and comparative study. Expert Sys. Appl. 227: [20] Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago
120201 B and Tabona O 2021 A survey on missing data in machine
[11] Luo Y 2022 Evaluating the state of the art in missing data learning. J. Big Data. 8: 1–37
imputation for clinical data. Briefings Bioinform. 23(1): [21] Utukuru S, Krishna P R and Karlapalem K 2023 Missing
bbab489 data resilient ensemble subspace decision Tree Classifier, in:
[12] Dung N V, Trung N L and Abed-Meraim K 2021 Robust Proceedings of the 6th Joint International Conference on
subspace tracking with missing data and outliers: novel Data Science and Management of Data 10th ACM IKDD
algorithm with convergence guarantee. IEEE Trans. Signal CODS and 28th COMAD: 104–107
Process. 69: 2070–2085 [22] Giraud-Carrier C, Brazdil P, Soares C and Vilalta R 2009
[13] Taşc1 E, Ülütürk C, Uğur A, 2021 A voting-based ensemble Meta-learning. In Encyclopedia of Data Warehousing and
deep learning method focusing on image augmentation and Mining, IGI Global, (2nd edition): 1207–1215
preprocessing variations for tuberculosis detection. Neural [23] Khwaja A S, Anpalagan A, Naeem M and Venkatesh B 2020
Comput. Appl. 33(22): 15541–15555 Joint bagged-boosted artificial neural networks: using
[14] Hasan M, Alam M, Roy S, Dutta A, Jawad M T and Das S ensemble machine learning to improve short-term electricity
2021 Missing value imputation affects the performance of load forecasting. Electric Power Syst. Res. 179: 106080
machine learning: a review and analysis of the literature [24] Garcia-Gonzalez D, Rivero D, Fernandez-Blanco E and
2010–2021. Inf. Med. Unlocked 27: 100799 Luaces M R 2020 A public domain dataset for real-life
[15] Banjarnahor J, Zai F, Sirait J, Nainggolan D W and human activity recognition using smartphone sensors. Sen-
Sihombing N G D 2023 Comparison analysis of C4.5 sors 20(8): 2200
algorithm and KNN algorithm for predicting data of non- [25] Casale P, Pujol O and Radeva P 2012 Personalization and
active students at prima indonesia university. Sinkron: jurnal user verification in wearable systems using biometric
dan penelitian teknik informatika 7(4):2027–2035 walking patterns. Personal Ubiquitous Comput. 16: 563–580
[16] Tran C and Nguyen B 2024 Random subspace ensemble for [26] Mayer I, Sportisse A, Josse J, Tierney N and Vialaneix N
directly classifying high-dimensional incomplete data. Evo- 2019 R-miss-tastic: a unified platform for missing values
lut. Intell. 1–13 methods and workflows. arXiv preprint: arXiv:1908.04822
[17] Aleryani A, Bostrom A, Wang W and Iglesia B 2023
Multiple imputation ensembles for time series (MIE-TS).
ACM Trans. Knowl. Discovery Data 17(3): 1–28 Springer Nature or its licensor (e.g. a society or other partner)
[18] Melville P and Mooney R 2003 Constructing diverse holds exclusive rights to this article under a publishing agreement
classifier ensembles using artificial training examples. Ijcai. with the author(s) or other rightsholder(s); author self-archiving of
3: 505–510 the accepted manuscript version of this article is solely governed
[19] Polikar R, DePasquale J, Mohammed H S, Brown G and by the terms of such publishing agreement and applicable law.
Kuncheva L I 2010 Learn??. MF: a random subspace
approach for the missing feature problem. Pattern Recognit.
43(11): 3817–3832

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy