0% found this document useful (0 votes)
20 views8 pages

Delachaux 2013

Uploaded by

stxvxnpereira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views8 pages

Delachaux 2013

Uploaded by

stxvxnpereira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Indoor Activity Recognition by Combining

One-vs.-All Neural Network Classifiers


Exploiting Wearable and Depth Sensors

Benoı̂t Delachaux, Julien Rebetez,


Andres Perez-Uribe, and Héctor Fabio Satizábal Mejia

IICT, HEIG-VD, University of Applied Sciences Western Switzerland (HES-SO)


benoit.delachaux@gmail.com, {julien.rebetez,andres.perez-uribe,
hector-fabio.satizabal-mejia}@heig-vd.ch

Abstract. Activity recognition has recently gained a lot of interest and


appears to be a promising approach to help the elderly population pur-
sue an independent living. There already exist several methods to detect
human activities based either on wearable sensors or on cameras but
few of them combine the two modalities. This paper presents a strat-
egy to enhance the robustness of indoor human activity recognition by
combining wearable and depth sensors. To exploit the data captured by
those sensors, we used an ensemble of binary one-vs-all neural network
classifiers. Each activity-specific model was configured to maximize its
performance. The performance of the complete system is comparable to
lazy learning methods (k -NN) that require the whole dataset.

Keywords: Smart home, Activity recognition, Artificial neural networks,


One-vs-all binary classifier fusion, Wireless sensors, Depth sensors

1 Introduction

Supporting the quality of life of aging population is one of the greatest chal-
lenges facing our societies nowadays. With age comes neurodegenerative diseases,
memory loss, increased risks of falls, and broken limbs. However, smart home
technologies offer new opportunities to help the elderly population pursue an
independent living. In our work, we are interested in using activity recognition
to build applications that can help people with cognitive or memory problems.
We focus on indoor environments, such as an apartment, which can be equipped
with static sensors (e.g., Kinect depth cameras). In addition, the users can also
carry wearable sensors which can be hidden in clothes or in accessories (e.g., a
watch or a smartphone). In this context, an activity recognition system should
be able to (i) recognise a wide range of activities, (ii) be flexible with regards to
loss of one or more sensor modalities.
There are numerous previous works in activity recognition [7, 10, 13]. Many
works target a medical application: for instance, Amft et al. [1] use a variety
of on-body sensors to perform dietary monitoring and therefore help patients

I. Rojas, G. Joya, and J. Cabestany (Eds.): IWANN 2013, Part II, LNCS 7903, pp. 216–223, 2013.
c Springer-Verlag Berlin Heidelberg 2013
Indoor Activity Recognition Using Neural Network Classifiers 217

with their nutrition. They use multiple modalities (accelerometers, microphones


and electromyogram) to detect various activities like chewing, swallowing, drink-
ing, cutting, eating soup. Hondori et al. [5] present a system that helps monitor
various dining activities of post-stroke patients using a Kinect camera and ac-
celerometers. Kepski et al. [6] built a system that uses the Kinect and a single
accelerometer to perform fall detection.
In this paper, we explore the possibility of a smart home system that would
recognize high-level daily activities (e.g., eating, cleaning, drinking, reading)
using artificial neural networks, for instance to help people with memory loss,
by recording a visual log of their daily activities. It has been shown that such a
system can help those people by acting as a cognitive compensation system [3],
and at the same time, given that memory loss can be frustrating, it can provide
a positive feedback letting them remember their recent activities.
This contribution is organized as follows: In Section 2, we present the dataset
we used to conduct our experiments. In Section 3, we explain the various steps
of our activity recognition pipeline. Section 4 discusses the results we obtained
and finally Section 5 presents our conclusions.

(a) Stand (b) Lying on the floor

Fig. 1. Kinect extracted skeleton for two sample poses. The skeleton is overlaid in
green, with yellow dots indicating the joints with a low confidence score reported by
the sensor. The skeleton provided by the Kinect is reliable when the person is standing
in front of the sensor, but it is unreliable when parts of the person’s body are occluded.

2 Dataset
Several activity recognition datasets are available and allow researchers to
benchmark their algorithms. However, even recent datasets [9] do not include
depth-sensors. Our dataset consists of recordings of typical daily living activities
performed by one subject so far, but we are working on a database where several
subjects will take part in the recordings1 . The recording environment simulates
the setup of a small apartment with one Kinect camera per room. We decided
1
The current database is available upon request. The complete database involving
several subjects will be available on our website http://ape.iict.ch soon.
218 B. Delachaux et al.

to use Kinect and accelerometer sensors because they are both relatively cheap
hardware (when compared to motion capture setup, for example) and are not
too invasive for the user of our system. In addition to its skeleton tracking ca-
pabilities, the Kinect can also be used as a camera and therefore allows us to
show pictures of the activities to the user, which are more easily interpretable
than text labels as shown by Browne et al. [3]. In order to respect the privacy
of the person, the system might not store the video after having been processed
(i.e., it will only store the snapshots, the duration of the activities, etc.).
During the sequence, the subject performs various daily activities including:
Read, Sleep, Sit idle, Dress, Undress, Brush teeth, Clean a table, Work at the
computer, Tidy up the wardrobe, Pick something up, and Sweep the floor, re-
peating each activity multiple times. The duration of the first recording is 2176
seconds, which corresponds to 65’057 samples. Each sample consists of the skele-
ton2 of the person, obtained from the Kinect sensor, placed at a height of around
2 m in the rooms, using the Microsoft’s SDK [12], and the 3-axis acceleration
from 5 wireless inertial measurement units (IMU) from Shimmer Research3 ,
placed on the wrists, ankles and on the back. In addition, for each timestamp
the Kinect also gives us a RGB image that we used to label the data (by hand)
and validate our system. Figure 1 shows two examples of Kinect captures with
the extracted skeleton overlaid in green. An in-house software was used to syn-
chronize the accelerometers with the Kinect. Figure 2 shows the activity labels
of the whole sequence used in our experiments.

Ground truth labels


work
sit_idle
tidy_up
pick_up
clean_table
activity

read
dress
sleep
undress
brush_teeth
drink
sweep_floor
none
0 10000 20000 30000 40000 50000 60000
sample

Fig. 2. Activity labels corresponding to the sequence of 65’057 samples

3 Learning Pipeline
Activity recognition was achieved by combining a set of binary classifiers based
on feedforward artificial neural networks (i.e., Multi-layer Perceptrons).
2
The Kinect sensor tracks the position (relative to the sensor) of 20 body joints at a
rate of about 30hz.
3
http://shimmer-research.com
Indoor Activity Recognition Using Neural Network Classifiers 219

We decided to use a one-vs-all approach [4] and trained a neural network for
each activity. Each individual neural network was trained to distinguish all the
samples belonging to one activity from a randomly chosen set of samples be-
longing to all other activities. The main reason for this choice is that different
activities might require different features. For example, one might assume that
the position of the feet is not relevant to detect the drink activity.
Our learning pipeline consists of the following steps: acceleration and
skeleton acquisition, signal filtering, window-size selection, neural network com-
plexity (number of hidden neurons) selection, and input selection, for each indi-
vidual binary classifier. We performed parameter exploration to help us choose
the best value for each parameter of this learning pipeline, when possible. This
means that running the whole pipeline can be quite time-consuming, but most
parameters (e.g., window size, topology) that are found optimal on our test se-
quences are likely to be near-optimal on new similar sequences that our system
might encounter in the future.
Acceleration and Skeleton Acquisition. We used 5 three-axis accelerome-
ters. Two were placed on the legs, one on the back and one on each wrist of
the subject. This is a subtotal of 3 ∗ 5 = 15 inputs from the accelerometers. In
addition, we used the following information from the Kinect skeleton data : the
position of the person in the room, the position of left and right hands, elbows
and hips, and the position of each hand relative to the corresponding shoulder.
This gives a subtotal of 3 ∗ 9 = 27 inputs from the Kinect. So we have a total
number of 15 + 27 = 42 input values for each frame.
Signal Filtering. We used a 1Hz low-pass filter to only consider the low fre-
quency components of the subject’s movements. Hence, we mostly kept data
about the posture of the subject when performing an activity. Indeed, most of
the activities we are interested in have quite different postures.
Window Size Selection. After the signal has been filtered, we used a sliding
window to extract feature vectors. We tried window sizes between 10 and 100
samples, and chose the one that gave us the least validation error. We used the
absolute error  = |yground truth −ypredicted | to evaluate the training performance
of each single neural network for this step. Although not all activities have the
same optimal window size, we used a single window size for all activities (for
simplicity). We chose a window size of 30 samples, which was a good choice for
almost all activities. We tested two overlapping settings for our sliding windows:
50% and “99%” (just slide the window by one value). The windows are used to
sub-sample the original time-series by computing the mean.
Topology Selection. Since we used one-vs-all networks, specific to each activ-
ity, such networks have a single output unit indicating whether the input corre-
sponds to the execution of the given activity (1) or not (0). We used sigmoidal
output neurons, thus output values between 0 and 1 represent the neural net-
work’s certainty regarding the classification of a particular input pattern. Since
we found that networks with a single hidden layer were sufficiently complex for
220 B. Delachaux et al.

our classification problem, the topology selection consisted on (i) determining


the number of hidden units and (ii) selecting the inputs features.
Neural Network Complexity (hidden units). To determine the number
of hidden units used in our networks, we tried values between 1 and 8 and
performed a training/validation procedure using the whole dataset. From these
results, we chose for each activity, the best number of hidden units (e.i., the one
that minimizes the classification error); those numbers are reported in the table
below. Hidden units used the hyperbolic tangent function.

Table 1. Number of hidden units chosen for each activity

sweep floor drink brush teeth undress sleep dress read clean table pick up tidy up sit idle work
4 8 7 7 8 8 1 5 8 8 5 4

Binary Classifiers’ Inputs Selection. As an additional optimisation, for


each activity-specific neural network classifier, we used the sensitivity matrix
method [11] to select the subset of inputs with the highest relevance in the task
of one-vs-all classification, using the whole dataset. Figure 3 shows the set of
selected inputs for each activity-specific binary classifier.

Per-activity inputs selection


pick_up
sweep_floor
read
drink
undress
sleep
tidy_up
clean_table
dress
brush_teeth
work
sit_idle

igh
t ft k ht ft n nd ow _hip
d ow hip er er
t_r t_le bac _rig _le itio _ha _elb left han elb h t_ uld uld
wris leg leg pos left left h t_ h t_ rig sho sho
wris rig rig lh_ rh_
Accelerometers Kinect

Fig. 3. Matrix showing the selected inputs for each class. Each modality has 3 axis
(x, y, z). For each activity, we select a different set of inputs (rows). A black square
indicates that the input has been selected for the corresponding activity. On the left
are the features computed from accelerometers, on the right the features computed
from the Kinect (the red line shows the separation).
Binary Classifiers’ Training and Fusion. Each Multi-Layer Perceptron
(MLP) implementing an activity-specific one-vs-all binary classifier was trained
using the FENNIX software4, developed by one of the authors. In particular, we
used the Backpropagation algorithm [2] during 50 and 250 epochs, a decreasing
learning rate from 0.025 to 0.001, and a momentum term of 0.7. The output
of the ensemble of binary classifiers was the action having the highest classi-
fication certainty value (i.e., the highest activity-specific one-vs-all classifier’s
output value), provided that such a value was greater than 0.7. Otherwise, we
considered the recognized activity as being the none activity, which is a special
class for unknown activities.
4
http://fennix.sourceforge.net
Indoor Activity Recognition Using Neural Network Classifiers 221

4 Results

To evaluate the performance of our system, we compared it to a 10 -NN (k


Nearest Neighbors) classifier, which is often used as a benchmark classifier for
activity recognition tasks. We chose k = 10 here because this gave the best
results after some limited testing of the values of k. We performed two-fold
cross-validation, by creating 10 different training/validation pairs by sliding the
training data window by 10% each time. Then, for each of the training/validation
pair, we performed 20 runs, where we trained our model using the corresponding
training set, and then proceeded to evaluate its validation performance, using the
rest of the dataset. Therefore, a total of 200 runs is performed per experiment. To
compute the resulting performance of the classifier, we used the average (over all
classes) of the f (1) score [8] shown in Equation 1, which estimates how accurately
it classifies a particular activity as such. In Equation 1, precision = tp+f tp
p,
recall = tp+f n , tp are the true positives, f p are the true negatives, and f n are
tp

the false negatives.


precision · recall
f (1) = 2 · (1)
precision + recall

To fully profit from the use of both wearable and static sensors, we implemented
a model switching capability within the activity recognition system as follows:
given that the skeleton data provided by the Kinect has a quality flag for each
joint and frame, indicating the confidence in the reported joint position, when
this flag was low for most of the joints being tracked, we considered that the
Kinect data was not reliable and we relied on a model based only on the ac-
celerometers, to recognize the activity being performed. To achieve this, we
trained two models: one using both the wearable and the static depth-based
sensors, and a second one using only the wearable accelerometers.

With none Without none


0.90 0.84 0.81 0.75 0.77 0.77 0.81
0.90 0.8 0.8 0.73 0.76 0.75 0.8

0.85 0.85

0.80 0.80

0.75 0.75
F1 score

F1 score

0.70 0.70

0.65 0.65

0.60 0.60
50% overlap 50% overlap
0.55 0.55
99% overlap 99% overlap
0.50 0.50
k-NN Accelerometers Accelerometers k-NN Accelerometers Accelerometers
& Kinect & Kinect

Fig. 4. Influence of model switching on classification performances (k -NN is shown for


comparison). There are two plots: one where we take the none class into account and
one where we do not. The window overlap parameter is indicated by the color. The
number on top of each boxplot show the value of the median, for ease of comparison.
222 B. Delachaux et al.

To evaluate the usefulness of model switching on our dataset, we compared the


performance of a model using only the accelerometers and the one of a model
using model switching. The results are shown on Figure 4. Notice that using
model switching improves performances by about 5% when using a single frame
window overlap (“99%”). The gain is less important when using a 50% window
overlap. One of the reasons that might explain this is that when using “99%”
window overlap, we have roughly 14 times more data (windows have a length of
30) to train our network. We can also see that using model switching allows us
to get performances very close to k -NN.
Figure 4 shows the overall performances of our system with and without input
pruning. We see that input pruning based on the sensitivity matrix method [11]
gives us a considerable performance improvement (around 14%), when using a
“99%” window overlap, whereas the performance improvement is of around 7%,
when using a 50% window overlap. As we mentioned before, this is likely due to
the fact that having more window overlap means more training data. We also see
that we get performances that are comparable to k -NN, which is an encouraging
result, given that we do not use the whole dataset to classify the input patterns.

With none Without none


0.90 0.84 0.81 0.72 0.71 0.77 0.81
0.90 0.8 0.8 0.7 0.7 0.75 0.8

0.85 0.85

0.80 0.80

0.75 0.75
F1 score

F1 score

0.70 0.70

0.65 0.65

0.60 0.60

0.55 50% overlap 0.55 50% overlap


99% overlap 99% overlap
0.50 0.50
k-NN NNet all inputs NNet pruned inputs k-NN NNet all inputs NNet pruned inputs

Fig. 5. Classification performance (f (1)score) for our pipeline. On the left, k -NN is
used as a benchmark. There are two plots: one where we take the none class into
account and one where we do not. Then, in the middle of each plot, the performance
of our models without input pruning is shown and on the right are the results for
our models with input pruning. The window overlap parameter is indicated by the
color. The number on top of each boxplot indicates the value of the median, for ease
of comparison.

5 Conclusions
This paper presents an activity recognition system built using activity-specific
one-vs-all artificial neural networks. The experiments demonstrated that the
performance of the system is comparable with a k -NN classifier but with the ad-
vantage that, once trained, the training database can be discarded. This makes
it possible to use this kind of system on an embedded device like a smartphone
or a tablet. We also showed that input pruning and custom configuration of each
activity-specific model can be used to improve the performance of the system.
Indoor Activity Recognition Using Neural Network Classifiers 223

Last but not least, we showed that the combination of wearable motion sensors
and static depth sensors can enhance not only the performance, but the robust-
ness of indoor activity recognition systems by taking advantage of one modality
when the other one is not reliable. Our results are quite encouraging, but further
tests with several subjects are required in order to verify to a greater extent the
generalization performance of such a system. Moreover, given that our aim is
to work towards activity recognition democratization, we will try to use as few
sensors as possible in the future.

Acknowledgements. This work was funded by the Hasler Foundation and


the RCSO-TIC of the HES-SO. We thank our colleagues from the Wearable
Computing Lab. ETH, for useful discussions as well as L. Heim, O. Hüsser and
F. Tièche from the HE-ARC/HES-SO for their contribution regarding the use
of the Kinect SDK.

References
1. Amft, O., Tröster, G.: Recognition of dietary activity events using on-body sensors.
Artificial Intelligence in Medicine 42(2), 121–136 (2008)
2. Bishop, C.: Neural networks for pattern recognition. OUP, USA (1995)
3. Browne, G., et al.: Sensecam improves memory for recent events and quality of life
in a patient with memory retrieval difficulties. Memory 19(7), 713–722 (2011)
4. Garcia-Pedrajas, N., Ortiz-Boyer, D.: An empirical study of binary classifier fusion
methods for multiclass classification. Information Fusion 12(2), 111–130 (2011)
5. Hondori, H., et al.: Monitoring intake gestures using sensor fusion (microsoft kinect
and inertial sensors) for smart home tele-rehab setting. In: 2012 1st Annual IEEE
Healthcare Innovation Conference (2012)
6. Kepski, M., Kwolek, B.: Fall detection on embedded platform using kinect and
wireless accelerometer. In: Miesenberger, K., Karshmer, A., Penaz, P., Zagler, W.
(eds.) ICCHP 2012, Part II. LNCS, vol. 7383, pp. 407–414. Springer, Heidelberg
(2012)
7. Lara, O.D., et al.: Centinela: A human activity recognition system based on accel-
eration and vital sign data. Pervasive and Mobile Computing 8(5), 717–729 (2012)
8. Rijsbergen, C.J.V.: Information Retrieval, 2nd edn. Butterworth-Heinemann,
Newton (1979)
9. Roggen, D., et al.: Collecting complex activity data sets in highly rich networked
sensor environments. In: Proceedings of the Seventh International Conference on
Networked Sensing Systems (INSS), pp. 233–240. IEEE CSP (2010)
10. Sagha, H., et al.: Benchmarking classification techniques using the Opportunity
human activity dataset. In: IEEE International Conference on Systems, Man, and
Cybernetics (2011)
11. Satizábal M, H.F., Pérez-Uribe, A.: Relevance metrics to reduce input dimensions
in artificial neural networks. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic,
D.P. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 39–48. Springer, Heidelberg (2007)
12. Shotton, J., et al.: Real-time human pose recognition in parts from single depth
images. In: Computer Vision and Pattern Recognition (CVPR), pp. 1297–1304.
IEEE (2011)
13. Stiefmeier, T., et al.: Wearable activity tracking in car manufacturing. IEEE
Pervasive Computing 7(2), 42–50 (2008)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy