1 Introduction

HAR systems are gaining popularity in this digital age as it is used to identify various context-aware activities to get accurate medical services in the early stage of the disease and various ambient intelligence applications  (Lima et al. 2021). Recently, in the Covid-19 pandemic situation “work from home”, i.e., working with computers for long periods without moving outside, increased physical inactivity among the population with very negative impacts on the health  (Rezende et al. 2016). Physical activities, such as walking, moving upstairs, moving downstairs, sitting, standing, and lying, can be monitored using smartphone-based HAR frameworks, to reduce physical inactivity while staying at home. HAR is also helpful to monitor the rehabilitation of cardiac and neuro-related diseases. An efficient and robust implementation of HAR systems depends on capturing physical signals of human activity with the help of sensors. Typically, wearable and ambient sensors are commonly used to capture physical signals. In the case of wearable sensors, users often find it problematic to wear additional hardware on the body. Furthermore, hardware associated with wearable sensors and sophisticated signal processing techniques is required for it  (Chen et al. 2019). On the other hand, ambient sensors restrict capturing data in restricted areas (where sensors are located) and suffer from other issues surrounding privacy policy. Recent advancement in the smartphone, with many built-in sensors, has become a powerful tool to collect physical human activity signals. Due to the noninvasive property and diversity of built-in sensors, the smartphone is widely adopted for HAR.

Raw smartphone sensory signals are not in the appropriate form due to highly fluctuating and oscillating values. Thus, it is very difficult to identify the rudimentary patterns using these raw signals. Moreover, without extracting the proper features the classifier fails to identify similar human physical activities such as walking and moving upstairs. Features extracted from calibrated sensor data may contain redundant and irrelevant data, which can affect classification performance. It is critical to use an effective feature selection strategy to discover a subset of the most discriminative characteristics that can improve classification performance while also removing redundant features that provide no new information to the classifier. Moreover, the dimension of the feature set is a big concern for the researchers as a high dimensional feature set increases the training time of the model and also leads to overfitting problems. Both reveal a great impact on the performance measure of HAR systems because high dimensional feature sets often lead to poor results and hamper the process of creating a good and reliable prediction model of HAR. Insignificant and redundant features reduce the performance of classifier  (Yan et al. 2020). So, before feeding the data in the classification algorithm to recognize physical human activities, it is necessary to extract relevant features to improve the classification accuracy. For instance, there are methods based on ranking variables or selecting features that minimize a given criterion (filters methods.) Other methods check the performance of the features for a given classifier (wrapper methods) or select features during the execution of classification tasks (embedded methods). The wrapper method is more complex than the filter method. Moreover, wrapper methods are prone to over-fitting and are dependent on the classifier. Filter methods ignore interaction with the classifier and only rely on the general characteristics of the data.

For example, ReliefF is one of the filter methods which is widely used in HAR  (Karagiannaki et al. 2016). However, ReliefF has two major drawbacks. It is unable to remove redundant features efficiently and its performance relies on the size of the training data  (Zhang and Sawchuk 2011). Considering the embedded methods generally lead to better results compared to filter and wrapper methods. Random Forest (RF) based feature selection methods are embedded methods. Methods that regularize RF  (Deng 2013), have shown their efficiency in reducing the model complexity while providing a compact set of features. Regularized RF models are previously tested for applications of genetic research  (Deng 2013) which ignore the highly correlated features. As a result, regularized feature selection methods do not lead to data loss. Recently one RF-based feature selection method namely, Guided Random Forest (GRF) is applied in human activity recognition  (Uddin and Uddiny 2015). Though GRF is successful to select relevant features, it fails to select the non-redundant features  (RColorBrewer et al. 2018). Despite the exhaustive research works, all the RF-based feature selection algorithms are not tested yet in HAR. The aforementioned issues motivate us to propose an integrated HAR system using both the feature extraction and GRRF  (Deng and Runger 2012) feature selection method, for best feature sets. GRRF uses the importance score, computed from an RF for all the training data. In this, it is not required to fix the number of selected features in advance and it is also independent of any threshold value for the importance of the feature. GRRF uses coefficient regression to select the features. Due to regularization, GRRF is capable of selecting both the relevant and non-redundant features. GRRF feature selection method performs well in several domains such as Gene selection as well as in remote sensing  (Deng 2013; Mureriwa et al. 2016). According to the best of our knowledge, this is the first proposal for using GRRF in a smartphone-based HAR system.

1.1 Contribution

There is a two-fold scientific contribution of this research work \(:\)1) Using the GRRF feature selection method, we provide an accurate HAR framework with a low computational complexity which can be used in small computational devices for the real-life application of HAR such as online health monitoring as the computation time is low due to the use of a minimum feature set. 2) The preprocessing and segmentation of the self-collected dataset is another scientific contribution in this research that can be used for other scientific research as we will contribute this dataset publicly in the future.

Following contributions are made in this paper:

  1. 1.

    A new implementation of GRRF based HAR system which can efficiently and accurately identify six basic human physical activities (sitting, standing, lying, walking, upstairs, downstairs) using 3-axial accelerometer and gyroscope sensors of smartphones.

  2. 2.

    In-depth analysis of the behavior of the proposed GRRF based HAR system using different classifiers.

  3. 3.

    Comparison of the proposed system with two different benchmark schemes such as ReliefF and GRF and also with other state-of-the-art approaches using two different datasets.

2 Related work

To better understand human behavior and serve human-centric applications, much work has been done in recent years to investigate various feature selection methods and propose a plethora of categorization models to improve recognition accuracy  (Thakur and Biswas 2020).

In  Gupta and Dallas (2014), the authors proposed a waist-mounted 3-axial accelerometer-based HAR system using both the feature extraction and ensemble feature selection. The author used time and frequency domain features followed by ReliefF and sequential forward floating search (SFFS) feature selection methods. For classification, the author used Naive Bayes (NB) and k-Nearest Neighbor (KNN) and for each activity, they achieved an accuracy of more than 95%. In this work, in comparison to ReliefF, SFFS selected nearly half the amount of features and gave higher accuracy. SFFS is used as a searching strategy in the wrapper feature selection method, which takes high computational time and is more prone to overfitting. Atallah et al.  (Atallah et al. 2011) evaluated the optimal sensor position and pertinent properties using six wearable accelerometers in addition to the ear-worn activity recognition sensor (e-AR) at various body positions. ReliefF  (Kira et al. 1992), Simba  (Gilad-Bachrach et al. 2004), and Minimum Redundancy Maximum Relevance (mRMR)  (Peng et al. 2005) were examined as filter-based feature selection approaches for each sensor. For activity classification, the KNN method (with k = 1, 5, and 7) and the Bayesian classifier were used. The activities were divided into five categories: extremely low level, low level, medium level, high level, and transitional. The results of the three feature selection algorithms, as well as the classification performance of the k-NN (k = 5 and 7) and Bayesian classifier, were very similar. However, the results revealed that none of the sensor locations could provide high precision and recall for all of the groups on their own. In  (Capela et al. 2015), the authors calculated 76 signal features and also selected a subset of features based on three different filter-based feature selection methods (ReliefF, Correlation-based Feature Selection (CFS), Fast Correlation Based Filter (FCBF)). The authors achieved higher accuracy of \(97.52\%\) using the CFS method. An unsupervised graph-based strategy was used in the HAR system in  (Karagiannaki et al. 2016). In this work, as a feature selection strategy, ReliefF and SFFS were used to select features from the UCI HAR dataset and ReliefF performed well on the UCI HAR dataset. In  (Suto et al. 2016), the authors presented wearable sensor-based HAR with both the feature selection and extraction. Here, the authors used the “Naive Bayesian” wrapper feature selection method and also compared this with some popularly used filter feature selection methods. According to the results of this research, the wrapper strategy beats filter algorithms in the HAR domain. The authors in  (Nguyen et al. 2018), used wearable sensors using KNN to recognize human activities. In this work, the authors also used feature extraction followed by feature selection. Here, feature selection was performed separately on each sensor position and achieved an overall accuracy of \(99.13\%\). However, the complexity of the proposed method is high as it is considered position-based feature selection. The authors in  (Tian et al. 2020), proposed wearable sensor-based HAR using wavelet energy spectrum features followed by ensemble-based filter feature selection method. The findings of the experiments suggested that wavelet decomposition based features can improve activity detection accuracy and increase activity discrimination. In this work, we ensemble four different filter-based feature selection methods including ReliefF. In  (Wang et al. 2016), the authors used time and frequency domain features and a hybrid filter and wrapper method for feature extraction. In this work, the authors used a smartphone accelerometer and gyroscope to collect data and proposed a hybrid feature selection method that extracted 66 features out of 561 features and achieved an accuracy of \(91\%\). In another work  (Wang et al. 2016), the authors used a single wearable triaxial accelerometer with “Ensemble Empirical Mode Decomposition (EEMD)” based features and feature selection method to identify seven activities with an average accuracy of \(78.12\%\) and \(81.12\%\) for two different locations of the accelerometer. Recently, a hybrid method of filter and wrapper feature selection was proposed in  (Ahmed et al. 2020) in smartphone-based HAR systems. SFFS was used to extract various features with SVM. The SVM was used as a classifier in this and achieved an accuracy of \(98.13\%\).

According to the HAR literature, it is observed that the ReliefF feature selection method is widely used. In ReliefF feature evaluation is based on the distinguishability of close samples based on features. The ReliefF is simple in terms of time and does not employ classification accuracy as a criterion for evaluation. However, because it is based on the feature weight method, only the weight value of the feature with a high degree of association with the tag is increased when performing feature selection, therefore redundant features cannot be efficiently eliminated.

RF-based feature selection methods are almost unexplored in the field of smartphone-based HAR. According to the best of our knowledge, in  (Uddin and Uddiny 2015) the authors used the GRF feature selection method for HAR with different datasets. In this work, the authors never explained the processing of the data sets which were taken for the experiment. Also, in GRF the several features can use the similar information gain at a node with a small number of instances and a large number of features  (Deng and Runger 2012). As a result, GRF suffers from node sparsity issues. Therefore, it fails to select non-redundant features. In real applications, both the manually extracted features with proper feature selection are used to implement efficient smartphone-based HAR models. In this paper, we aim to implement a smartphone-based HAR model using the GRRF-based feature selection method.

3 Problem definition

This paper aims to explore how to enhance the performance of human physical activity recognition through feature extraction followed by an RF-based feature selection method namely GRRF. Let the extracted relevant feature set be \(F_k\) and \(y_k\) is represented as an activity class. Then the mapping in between the relevant feature vectors with the activity classes is done by the classification algorithm, A. Two different data sets are used to investigate the proposed method extensively. The dataset is denoted by DS. The mapping function is represented as

$$\begin{aligned} A :F_k \times DS \rightarrow y_k \end{aligned}$$
(1)

Our aim is to find out the minimum relevant feature set \(F'_k\) over \(F_k\), in such a way that the classifier model in equation 2

$$\begin{aligned} A' :F_k \times DS_{train} \rightarrow y_i \end{aligned}$$
(2)

is trained using \(DS_{train}\). The trained classifier \(A'\) can effectively and accurately recognize the human physical activities using trained dataset \(DS_{test}\), where \(DS_{train} \cap DS_{test}=\phi\).

4 Preliminaries

The main focus of this paper is to extract relevant and non-redundant features using an RF-based feature selection method namely GRRF, for smartphone-based HAR to improve the classification accuracy and reduce the training time.

4.1 Random forest

RF is an ensemble machine learning algorithm consisting of decision trees, which is fast and robust to the noise of target data  (Kontschieder et al. 2011). The main concept behind this RF is to reduce the prediction error taking into consideration the ensemble of decision trees and the correlation among their predictions  (Chan et al. 2008).

Pointing on a single tree of the forest, let \(D_k \in \mathbb {R}^{S_k \rightarrow F_k}\), where k is the kth division of the instances \(\left( S_k\right)\) and features \(\left( F_k\right)\). \(D_k\) is generated from the original data \(\left( X \in \mathbb {R}^{S_k \rightarrow F_k}\right)\) using random selection with replacement. For splitting the instances \(\left( S_k\right)\), the features existing in the subset \(\left( F_k\right)\) are treated as candidates, at each node. The Gini index \(\left( GI\right)\) is used to discover the best splitting features and the threshold value. Instances that are of higher values than the threshold value for the selected features are considered as the right node \(\left( v^R\right)\), otherwise, they are considered as left node \(\left( v^L\right)\). In such a way, after various splitting, instances have reached the terminal nodes from the root nodes \(\left( v^n\right)\). The terminal nodes are the terminal leaves which finally give the predictions of the instances. The ensemble prediction is obtained as a combination of the results of the individual trees of the forest. For classification, the majority vote rule is used in  (Criminisi et al. 2012). \(Classification:\hat{Y_k} = node_{k=1 \cdots n_{trees}}\hat{Y_k}\), where \(n_{trees}\) denoted total number of trees in the RF.

The number of features as split candidate \(\left( F_k\right)\) and the number of trees \(\left( n_{trees}\right)\), both are responsible to optimize a RF. In general, the number of features as split candidates are taken as \(\sqrt{F}\) for classification and an adequate number of trees to well maintain the performance and processing time. “K-fold cross-validation” is one of the methods to optimize these parameters.

4.2 RF as feature selection method

For a training set, \(\textit{X}\) with n number of instances where each instance contains p number of features, \(\textit{X}\) can be represented by a matrix \([x_{i,j}]\), \(i \in 1,2, \cdots ,n\) and \(j \in 1,2,....,p\). The label vector Y is represented as \([y_i]\), \(i \in 1,2,...,v\). As we are interested in feature selection, we can also represent \(\textit{X}\) as \([X_1, X_2,\cdots , X_k]\) i.e., \(X_k\) denotes the vector of feature k. RFs utilize the Gini index to create decision trees and decide the ultimate class in each tree. Therefore, to measure the impurity of a node v, the Gini index of that node is used  (Deng and Runger 2012) and is denoted by

$$\begin{aligned} Gini(v) = \sum _{k=1}^{K} {s_k}^v(1-{s_k}^v) \end{aligned}$$
(3)

where \({s_k}^v\) is the fraction of class-k records at node v. Node v is divided into left child and right child.

The features \(X_k\) of node v has the gini information gain denoted by \(Gain(X_k,v)\), based on which the node n is split.

$$\begin{aligned} Gain(X_k, v) = {} Gini(X_k, v) - w^LGini(X_k, v^L) - w^RGini(X_k, v^R) \end{aligned}$$
(4)

where, \(w^L\) & \(w^R\)= proportions of occurrences allocate to the left and right child of node v. The maximum \(Gain(X_k,v)\) valued features are divided. The importance score for features \(X_k\) is obtained as

$$\begin{aligned} {Impscore}_k= \dfrac{1}{n_{tree}} \sum _{n\in {S_X}_k}Gain(X_k, v) \end{aligned}$$
(5)

where \({S_X}_k\) is the set of divided nodes by \(X_k\) number of features in the RF with \(n_{tree}\) number of trees. The importance score of RF is used to assess the contribution of the features concerning the prediction of the classes.

4.3 Regularized RF

Regularized RF is used to reduce the number of features selected for classification  (Deng and Runger 2012). The gain in RRF for each feature \(X_k\) is represented as

$$\begin{aligned} \left. \begin{array}{r@{\;}l} G_{RRF}(X_k, v) = G(X_k, v) \quad if\, k \in F \\ G_{RRF}(X_k, v) = \lambda G(X_k, v) \quad if\, k \notin F \end{array}\right\} \end{aligned}$$
(6)

F is the selected feature set used to split the instances of the previous nodes and \(\lambda \in [0,1]\) is a penalty factor for the non-selected features of the previous nodes. RRF can select the non-redundant features as the features whose gain value is zero are not included in the selected features.

5 Methodology

In this section, we describe the proposed methodology in detail, which is shown in Fig. 1.

Fig. 1
figure 1

Work flow of proposed HAR model

5.1 Data collection

In this work, the data is collected using a Samsung Galaxy On-Max android smartphone. We create one android application to collect sensor data of six different human physical activities such as sitting, standing, walking, lying, walking upstairs, and walking downstairs. The android smartphone application uses tri-axial accelerometer and gyroscope sensors with a frequency of 50 Hz to accumulate the data, keeping the device in a front pant pocket or hand. The data is collected from 25 subjects including 15 females and 10 males aged about 15–45 years, height about 163– 172 cm, and weight about 52–65 kg. The subjects are healthy without any medical complications. All subjects are asked to perform six normal human physical activities. All the activities are performed for three minutes with the repetition of five times by each of the subjects. Hence, the dataset become balanced. All the activities are performed in both indoor and outdoor conditions. This dataset consists of 15,562 instances and 164 features  (Thakur and Biswas 2021; Thakur and Suparna 2021).

The other public dataset is taken from open source “UCI Machine Learning Repository”  (Anguita et al. 2013). According to the HAR literature, the UCI HAR dataset is widely used by the research community of the HAR domain. The smartphone sensor-based dataset consists of 30 subjects aged about 19 to 48 years, with 6 different human physical activities such as sitting, standing, walking, lying, walking upstairs, and walking downstairs. Here, a waist-mounted smartphone (Samsung Galaxy S II) with in-built sensors is used. Both the accelerometer and gyroscope sensors are used to collect the data. In this dataset total number of instances is 10299, the number of features is 561.

5.2 Data pre-processing

The raw data collected from the smartphone’s in-built sensors contain noise. As the data is collected using a mobile application there may be possibilities of missing and redundant data due to minor variations in the data collection performed by different individuals. To remove the high-frequency noise and the gravitational acceleration from the signal, a low-pass elliptic filter with 20 Hz cutoff frequency followed by a high-pass elliptic filter with 0.5 Hz cutoff frequency is applied respectively. Each signal is divided into a 5s sliding window with an overlap of 2s between two consecutive windows as of state-of-the-art literature  (Chen et al. 2019) show that 2–5s sliding window with 20–50 Hz frequency is the ideal situation for the segmentation of the collected data.

The public UCI dataset is already pre-processed using a noise filter. Here, Butterworth low pass filter was used to separate the gravitational components from the body motion components. In this, a low pass filter with 0.3 Hz cutoff frequency was used, assuming the gravitational force has only low-frequency components. Then fixed-width sliding windows of 2.56 s with 50% overlapping were used.

5.3 Feature extraction

Feature extraction is a dominant phase in any classification system. Time-domain features are used to extract signal or statistical metrics from raw signals and demonstrate how the signals vary over time. Mainly time domain features are applicable in online and real-time activity recognition with low computational time. On the other hand, frequency domain features are useful to validate the distribution of signal energy and are effective for the identification of repetitive activities. So, we decide to take time-frequency domain statistical features. From our self-collected data, we have extracted 17 time-frequency domain features. The features are extracted from each window input. The features are extracted for both the tri-axial acceleration(Acc) and gyroscope(Gyro). Some of the features are also calculated for both body acceleration (BA) and gravitational acceleration (GA). The resultant acceleration and angular velocity are also calculated from Acc and Gyro signals respectively. The resultant signal(RS) for both the Acc and Gyro is defined as \(\sqrt{(}x^2+y^2+z^2)\). Hence, the features are extracted from the eleven different signals. All together we extract 164 features from our self-collected dataset.

In the public UCI dataset, the feature vector is also extracted by computing variables from the time and frequency domain  (Anguita et al. 2013). From the UCI dataset 561 features are extracted, which is mentioned in Anguita et al. (2013).

5.4 Feature selection

After feature extraction, feature selection is used to reduce the dimension of the data for cost reduction. In our work, we use Random Forest (RF) as a feature selection algorithm. RF is an embedded feature selection method. Random forest overcomes the overfitting problem. Moreover, it provides good computing cost and it is easy to interpret. RF classifier is ordinarily utilized for estimating significant features. RF normally handles numerical and categorical factors, various scales, associations and nonlinearities, and so forth. Although the RF feature importance scores are utilized to choose K features with the most elevated importance scores independently, there could be repetition among the K features. So, based on the RF concept the Guided Regularized Random Forest (GRRF)  (Deng and Runger 2012), is used for feature selection to select relevant and non-redundant data based on the importance score of a group of features without any data loss.

figure a

The use of regularization ensures that selected features using GRRF are non-redundant and relevant  (Deng and Runger 2012). In GRRF, an importance score is used to identify the most relevant features for a particular domain and also to generate a feature selector technique. The importance score of features is calculated using the equation (5) where RF gain is used. Accordingly, the gain is calculated while splitting the samples at each node using impurity measures. Several impurity measures are mentioned in the RF feature selection literature such as permutation importance, subsample without replacement, and so on. However, due to regularization, GRRF used the most popular criteria, which is well known in classification, namely Gini Index (GI) function. The GI, which is easy and fast to compute  (Nembrini et al. 2018), minimizes the probability of misclassification by \(GI = 1 - \sum _{i=1}^{n_c}(p_i)^2\), where \(n_c\) is the total number of classes and \(p_i\) denotes the probability of class i. GRRF calculates the gain and ultimately the feature importance score based on RF training data. In GRRF, the importance score in equation (5) is calculated using all the nodes instead of a single node. GRRF has a specific regularization parameter for each feature. The regularization parameter maintains the RF gain mentioned in equation (4) of the previously selected nodes and punishes the gain of new features. Therefore, features of high importance scores are selected since its gain is penalized. Dimensionality reduction may cause information loss. However, in some cases, it is possible to represent the data in a lower-dimensional space without information loss. GRRF eliminates features of minimum importance score. The features with minimum importance scores do not affect the performance of the classifier as these features are less capable of capturing small information. In HAR solutions, the time-series continuous data for various activities is segmented according to the window size, and all features are extracted in each window. Also, various transitions are involved in each signal as the signal is continuous. Hence, greater information loss may affect the classification accuracy of the HAR solution. After an exhaustive experiment based on the classification accuracy of the unknown dataset for various activities, we can ensure that there is no information loss and the features are most relevant to identify the activities accurately.

The gain of GRRF is defined as

$$\begin{aligned} \left. \begin{array}{r@{\;}l} G_{GRRF}(X_k, v) = G(X_k, v) \quad if\, k \in F \\ G_{GRRF}(X_k, v) = \alpha _k G(X_k, v) \quad if\, k \notin F \end{array}\right\} \end{aligned}$$
(7)

where \(\alpha _k\) is the regularization parameter of each feature and \(\alpha _k\) is represented as

$$\begin{aligned} \alpha _k = (1 - \gamma )\lambda + \gamma \dfrac{{Impscore}_k}{{Impscore}_{max}} \end{aligned}$$
(8)

where

  • \({Impscore}_k\) = Importance score of \(X_k\) from an RF.

  • \({Impscore}_{max}\) = Maximized importance score.

  • \(\dfrac{{Impscore}_k}{{Impscore}_{max}} \in [0, 1]\) is the normalized importance score.

  • \(\gamma \in [0, 1]\) controls the weight of the importance score from RF.

If we take \(\gamma =0\) then the GRF becomes normal RF as the penalty is directly proportional to \(\gamma\) and those features are penalized more which contains fewer importance scores  (Deng 2013). To use the small number of features we use maximum penalty in our work. So the value of \(\gamma =1\) in our work.

Algorithm 1, represents the GRRF algorithm. In GRRF, the importance score is calculated from an RF using the aggregation of all the trees of the RF. However, its use as a feature selection method requires either fixing the number of features to select or applying a threshold of feature importance. GRRF uses a double regularization based on the RF feature importance and on penalizing each feature individually. Hence, guided regularization generates a subset of non-redundant and representative features.

5.5 Classification

With the advantages of statistical feature extraction methods and the GRRF feature selection method, we use four different shallow ML algorithms such as NB, DT, SVM, and RF to implement a HAR model. The effectiveness of the feature extraction and GRRF based feature selection is evaluated using the aforementioned algorithms with two different datasets. To assess the impact of feature extraction and selection, over the classifier, we use accuracy, precision, and recall as evaluation metrics. These evaluation metrics validate the proposed integrated approach to recognize human physical activities based on smartphones accurately and efficiently.

6 Performance evaluation

In this section, we discuss the experimental setup and also demonstrate the experimental results in terms of accuracy, precision, recall, and F-score. We also show the Receiver Operating Characteristic (ROC) curve to demonstrate a robust and valid comparison with other methods.

6.1 Experimental setup

After preprocessing of the data, the time-series signals are divided into segments known as time windows. Time windows are used for feature extraction. In the case of the self-collected dataset, each signal is divided in 5s sliding window with an overlap of 2s between two consecutive windows as 2s to 5s sliding window with 20 Hz to 50 Hz frequency is the ideal situation for the segmentation of the collected data and the UCI standard dataset, each signal is divided in fixed-width sliding windows of 2.56 s with 50% overlapping. After extracting the features based on the aforementioned time window, both the datasets are divided into two different groups, 70% of the volunteers are selected for training and 30% for testing the proposed HAR solution. Hence, the same subjects’ data are not in both the training and testing. After splitting the dataset into train and test, the feature selection is performed using GRRF to avoid overly optimistic results. We use 10-fold cross-validation for feature selection to get the most relevant and non-redundant features. The parameters involved in the GRRF are given a number of input variables, the total number of trees \((n_{tree})\), penalty factor for the features not selected in previous nodes \((\lambda )\) and regularization factor \((\gamma )\), which are optimized over a grid search using 10-fold cross-validation on the training set. Then, the obtained feature sets are given as input to each of the four classifiers.

As self-collected dataset contains 15562 instances and the UCI dataset contains 10299 instances, we randomly divide 70%(training data) of the instances into 10 different subsets. Each subset approximately contains an equal number of mutually exclusive instances for both datasets. In each iteration, one subset instance is reserved for the validation and the remaining subsets are used for training. The total number of iterations is 10 with both datasets to train the models using four different classifiers. Finally, the average of all the iterations for each classifier is taken as a final result such as extracted number of features, accuracy, precision, recall, and F-score. For DT, NB, RF, and SVM we use default parameters. RBF kernel is used in SVM with C = 0.1 and gamma = 0.5. For each experiment, the level of significance is 0.05. All the experiments are performed using \(10^{th}\) Generation Intel(R) Core(TM) i5-10210U Processor (6MB Cache, up to 4.2 GHz) with 16 GB of memory. To validate the performance of the recommended model, we compare the results of all aforementioned four classifiers using GRRF with the results of the same classifiers using Relief-F and GRF feature selection methods using two different datasets. ReliefF is a widely used filter-method approach to feature selection in the HAR domain according to the HAR literature and GRF is also an RF-based feature selection method. Hence, we have taken ReliefF and GRF as benchmark schemes to compare our proposed method.

6.2 Experimental results

Experimental results show that the GRRF selects minimum features and using this minimum feature vector, the performance of the classifiers is higher. Thus, the GRRF feature selection process selects the relevant and non-redundant features. The experimental results using two different datasets are tabulated in further sections. We compare the performance of the GRRF feature selection method not only with two other benchmark schemes, but we also compare the performance of the GRRF feature selection method using four different classifiers with some state-of-the-art feature selection methods. Moreover, we compare the performance of the proposed HAR framework with deep learning algorithms where feature learning is performed automatically.

6.2.1 Experimental results on self collected dataset

To assess the potency of the proposed method, we investigate the proposed method and two benchmark methods (Relief-F and GRF). To guarantee the performance of the feature selection methods, we use the default parameter settings of the algorithms. We also investigate the effectiveness of the selected features using four different classifiers where we use accuracy, precision, recall, and F-score as evaluation metrics. The experimental results in terms of the extracted number of features vectors using the self-collected dataset are tabulated in Table 1. Using our self-collected dataset, the GRRF feature selection method extracted a smaller number of features compared to Relief-F and GRF. To validate the proposed method, we have used four different popular classifiers such as DT, NV, RF, and SVM. The extracted number of features as mentioned in Table 1, are fed in those four different classifiers. The experimental results in terms of accuracy, precision, recall, and F-score for all the above-mentioned classifiers using three different feature selection methods are tabulated in Table 2. Using our self-collected dataset, all the classifiers with GRRF outperform the results of all the classifiers with Relief-F and GRF respectively. This signifies that the GRRF feature selection method is more relevant compared to Relief-F and GRF respectively. However, among all the classifiers SVM gives higher accuracy. Using DT, NV, RF, and SVM classifiers with GRRF, we achieve 94.59%, 96.54%, 97.74%, and 99.10% classification accuracies. From the results, we can easily predict that using our proposed method with a minimum number of features, we get higher accuracy using all the aforementioned classifiers. Table 3, depicts the training and testing time of all the classifiers using three different feature selection methods. The training and testing time of all the classifiers using GRRF is less in comparison to other feature selection methods. It also validates our self collected datasets. Out of all the four aforementioned classifiers, the performance of SVM is higher. Table 4 shows the confusion matrix using SVM classifier with GRRF using our self-collected dataset. This represents the type I (false positive) and II (false negative)classification error. Figure 2, shows the classification accuracy of SVM with Relief-F, GRF, and GRRF respectively for all the six activities separately in terms of percentage and Fig. 3 shows the ROC curve of all the four classifiers using GRRF feature selection method with self-collected dataset.

Table 1 Average number of selected features using ReliefF, GRF and GRRF for both the data sets
Table 2 Average performance of the classifiers using self collected data
Table 3 Average training and testing time using self collected data
Table 4 Confusion matrix of proposed method for HAR using SVM with self collected dataset
Fig. 2
figure 2

Classification accuracy of SVM classifier for all the six activities using our collected dataset

Fig. 3
figure 3

ROC Curve for the classifiers using self-collected dataset

6.2.2 Experimental results on UCI public dataset

Using UCI Public dataset, the GRRF feature selection method extracted a smaller number of features compared to ReliefF and GRF as shown in Table 1. To validate the proposed method, we have used four different popular classifiers such as DT, NV, RF, and SVM. The experimental results in terms of accuracy, precision, recall, and F-score for all the above-mentioned classifiers using three different feature selection methods are tabulated in Table 5. The training and testing time of all the classifiers using three different feature selection methods is tabulated in Table 6. Using UCI public dataset, all the classifiers with GRRF outperform the results of all the classifiers with ReliefF and GRF respectively.

Table 5 Average performance of the classifiers using UCI public dataset
Table 6 Average training and testing time using UCI public dataset

This signifies that the GRRF feature selection method is more relevant compared to Relief-F and GRF respectively. However, among all the classifiers like our own dataset, using UCI public dataset SVM gives higher accuracy as well.

Using DT, NV, RF, and SVM classifiers with GRRF, we achieve 95.74%, 97.54%, 98.79%, and 99.30% classification accuracies with the UCI dataset. Out of all the four aforementioned classifiers, the performance of SVM is higher. Table 7 shows the confusion matrix using SVM classifier with GRRF using UCI dataset. This represents the type I (false positive) and II (false negative)classification error. Figure 4, shows the classification accuracy of SVM with Relief-F, GRF, and GRRF respectively for all the six activities separately in terms of percentage and Fig. 5 shows the ROC curve of all the four classifiers using GRRF feature selection method with the UCI dataset.

Table 7 Confusion matrix of proposed method for HAR using SVM with UCI dataset

In both datasets, the GRRF feature selection method selects a lesser number of features compared to the other mentioned feature selection methods. As a result, our proposed HAR system recognizes human activities in much less time compared to the Relief-F based and GRF based HAR systems using two different datasets as shown in Tables 3 and 6. It signifies that the GRRF feature selection method reduces the training time of the proposed model. Therefore, the proposed model reduces the computational complexity of the proposed model. Using both the datasets and four different classifiers our proposed model with the GRRF feature selection method outperforms the Relief-f and GRF feature selection methods. For both the datasets the SVM has given the higher accuracy of 99.10% and 99.30% using the GRRF feature selection method for our collected dataset and UCI dataset respectively.

Fig. 4
figure 4

Classification accuracy of SVM classifier for all the six activities using UCI public dataset

Fig. 5
figure 5

ROC Curve for the classifiers using UCI public dataset

6.2.3 Compared with state-of-the-arts

The filter and wrapper-based feature selection methods are commonly used in almost every domain. We also compare the performance of the GRRF feature selection method with two filter-based: Information Gain (IG) and Chi-square test and two wrapper-based: Forward selection and backward elimination feature selection methods. Moreover, we compare the accuracy of the proposed method with other popular feature selection methods such as mRMR, CFS, and FCBF. We implement the aforementioned feature selection methods using both datasets. Table 8, demonstrates the experimental results of the aforementioned approaches and the proposed approaches. It is found that our proposed approach achieves superior performance over these state-of-the-art approaches. Both the information gain and the Chi-square test are skewed toward features with greater dispersion. on the other hand, the model hypothesis search is embedded within the feature subset search using wrapper methods. A search technique in the space of possible feature subsets is established in this scenario, and various feature subsets are produced and assessed. This strategy is geared to a single classification algorithm since it obtains the evaluation of a specific subset of characteristics by training and testing a specific classification model. A search algorithm is then “wrapped” around the classification model to search the space of all feature subsets. Heuristic search methods are employed to guide the search for an optimal subset because the space of feature subsets grows exponentially with the number of features.

Table 8 Comparison with state-of-the-arts w.r.t accuracy

6.2.4 Compared with DL classifier

As DL classifiers have outperformed the use of hand-crafted features in almost every field, we compare our proposed approach with two popular DL classifiers namely, “Convolutional Neural Network” (CNN) and “Long-Short-Term-Memory” (LSTM) for the publicly available UCI dataset. For the CNN and LSTM, we use 64 filters, kernel-size=3, activation function =’relu’, poolsize=2, number of epochs = 50 and batch-size=20. Hence, we compare the accuracy of different activities using CNN and LSTM with automatically extracted features, CNN with handcrafted features, CNN with handcrafted features, and our proposed method using an SVM classifier. Table 9 demonstrates the accuracy of different activities. It is clear from the results that CNN and LSTM with automatically extracted features give poor results for almost all the activities compared to our proposed method using a shallow ML algorithm. However, CNN with handcrafted features gives better results compared to CNN with automatically extracted features but is poorer than our proposed method with SVM. The same is also true with LSTM. Here, only the SVM classifier is taken as it gives the higher accuracy among all the four mentioned classifiers. In HAR, the signal patterns of similar activities are almost the same. Hence, the automatically extracted features from similar activities are the same. Thus, sometimes deep learning methods are unable to differentiate similar activities such as walking and walking upstairs. From these experimental results as reported in Table 9, we can conclude that there is the significant importance of handcrafted extracted features in the HAR domain. Hence, ML algorithms that rely on feature extraction as a separate phase perform well in HAR domain.

Table 9 Comparision with DL methods using UCI public dataset

7 Conclusion

In this work, we evaluated the performance of our proposed method on two different datasets: self-collected and the UCI public dataset. Experiments show that our work can produce better results using four different classifiers than the other benchmark schemes using Relief-F and GRF feature selection in the smartphone-based HAR domain. On the other hand, the proposed method also selects fewer features than Relief-F and GRF. Also, it outperforms the mentioned benchmark schemes in terms of training time, testing time, precision, recall, F-score, accuracy, and ROC in the domain of smartphone-based HAR using two different datasets. We also compare our proposed HAR framework using popular filter and wrapper-based feature selection methods. GRRF outperforms the state-of-the-art feature selection approaches using all four classifiers, as mentioned earlier. Recently, DL approaches have been considered state-of-the-art in almost all domains due to their automatic feature learning characteristics. However, DL approaches are unable to perform well in the HAR domain. To prove the claim mentioned above, we compare the performance of the commonly used CNN and LSTM with a fusion of handcrafted features and automatically extracted features by CNN and LSTM respectively to SVM with GRRF for UCI public dataset. The results demonstrate that the SVM with GRRF performs well compared to other approaches. It is also pertinent to mention that the LSTM with handcrafted features performs better than LSTM and CNN where only automatic feature learning is used. We can also conclude that the smaller number of features reduces the training time of the proposed model.

Although much work has been done in this field, our findings show that many challenges remain unsolved, particularly in activity recognition. Therefore, there are several aspects of future work. Firstly, comparison of GRRF feature selection method with other recent feature selection methods and experiment on the more available datasets using different available classifiers. Second, in real-life applications, the applicability of the proposed method should be analyzed.