Vabalas 2019
Vabalas 2019
autism diagnosis
Andrius Vabalas, Student Member, IEEE, Emma Gowen, Ellen Poliakoff
and Alexander J. Casson, Senior Member, IEEE
Abstract— Autism is a developmental condition primarily suited for identification of ASC biomarkers as ASC is a com-
identified by social and communication deficits. However, over plex and heterogeneous condition which can have different
70% of autistic individuals also show motor function deficits, expressions between affected individuals. Additionally, the
which are evident even when simple stereotyped movements
are performed. In this study, we have asked 24 autistic and current diagnostic process is long and subjective, based on
22 non-autistic adults to perform pointing movements between observation, interview, and questionnaire techniques applied
two markers 30 cm apart as quickly and as accurately as by clinical experts. The prospect of developing automated
they can for 10 seconds. Motion tracking was employed to algorithms assisting with ASC identification would speed up
collect data and calculate kinematic features of the movement the diagnostic process and make it more objective. Several
and aiming accuracy. At the group level, the results showed
that autistic individuals performed pointing movements slower previous studies have applied ML methods for ASC identi-
but more accurately compared to non-autistic individuals. At fication and a few also used kinematic data: tracking game-
the individual level, we have used Machine Learning methods play with a sensors on a tablet screen surface [5], tracking
to predict autism diagnosis. Nested result Cross-Validation reach-and-throw a ball in basket movements [6] and tracking
was used, which in contrast to commonly used K-fold Cross- a simple movement imitation task [7]. Those studies achieved
Validation avoids pooling training and testing data and provides
robust performance estimates. Our developed models achieved high classification accuracy rates of 86.7% to 96.7%. How-
a statistically significant classification accuracy of 71% and ever, these studies used K-fold Cross-Validation (CV), which
showed that even a simple and short motor task enables was demonstrated to produce overoptimistic performance
discrimination between autistic and non-autistic individuals. estimates [8], especially when a sample size is small [9]. It
I. INTRODUCTION was also not assessed whether classification performance was
statistically significantly different from random guessing.
Autism Spectrum Condition (ASC) is a group of com- In this study, we have explored whether kinematic charac-
plex developmental conditions primarily characterised by teristics of a short and easy to perform movement task can
social and communication deficits. Stereotyped and repetitive be used to help predict ASC diagnosis. We have developed
movements are also recognised as a symptom but receive automated feature selection and classification algorithms.
little attention in identifying ASC [1]. Nonetheless, growing Importantly, we have used nested result CV, which avoids
research interest for motor impairment in ASC demonstrated pooling training and testing data, and gives ”almost unbiased
that balance, gait, praxis and visuomotor functions are im- estimate” [8]. Additionally, we have assessed if classification
plicated and deficits show large, highly significant effect results were statistically significantly different from random
sizes [2]. Even very simple movement tasks like reach to guessing. The reminder of this paper is organised as follows.
grasp [3], repetitive hand pronation-supination and finger In Section II we present the methods used for data collection
tapping [4] show differences between ASC and typically and ML analyses. Section III presents behavioural and ML
developing (TD) individuals. results. Finally, conclusions are drawn in Section IV.
In this study we have asked adult ASC and TD participants
to perform a simple and short movement task — to point II. METHODS
between two points as quickly and as accurately as they A. Experiment and data
can. The movements were objectively measured using motion 24 ASC (9 female, age 31.5 years) and 22 TD (7 female,
tracking and Machine Learning (ML) methods were applied age 30.8 years) IQ matched participants performed a simple
to predict ASC diagnosis. pointing task. Two 8 mm diameter red stickers were attached
The strength of ML is the ability to find complex inter- 30 cm apart on the horizontal surface of a table in front of
actions between multiple variables and this makes it well seated participants who were instructed: ”With your index
A. Vabalas and A. J. Casson are with the School finger point between these two targets as quickly and as
of Electrical and Electronic Engineering, The University accurately you can”. The task was performed with dominant
of Manchester, UK. Email: {andrius.vabalas, and non-dominant hand. Polhemus Fastrak motion tracker
alex.casson}@manchester.ac.uk
E. Gowen and E Poliakoff are with the School of Biological Sci- was used for kinematic data collection with a single motion
ences, The University of Manchester, UK. Email: {emma.gowen, sensor attached to the distal phalange of the index finger.
ellen.poliakoff}@manchester.ac.uk Movement was sampled at 120 Hz in X, Y, Z coordinates and
A. Vabalas was supported by the UK Engineering and Physical Sciences
Research Council and its Doctoral Training Partnership with the University features based on velocity, acceleration, jerk and amplitude
of Manchester. were calculated for each pointing movement (Table I). In
D. Feature selection
Before classification the ML algorithm pipeline included
feature selection. Four feature selection methods were used
Fig. 1. Pointing accuracy measures, black points represent locations where to reduce feature space.
participant touched a table surface when performing pointing to a target
movement. (a) Area bounded by pointing locations. (b) Distances between SVM Recursive Feature Elimination (RFE) [13] algorithm
individual pointing locations. Red area illustrates a target sticker. selected features based on their importance for SVM clas-
sifier to separate classes. In this study SVM-RFE started
TABLE I
with a full feature set and in each iteration one feature
M OVEMENT MEASURES
was eliminated, which by an SVM algorithm was deemed
least important to separate classes, using weight vector of
1. Duration (s) 11. Peak deceleration (mm/s)
2. Peak velocity (mm/s) 12. Peak deceleration time (s) dimension length(s) as a ranking criterion [13]. The Final
3. Time to peak velocity (s) 13. Peak deceleration time (%) feature set was selected from the iteration in which SVM
4. Horizontal amplitude 14. Horizontal ampl. at which achieved best classification performance.
before peak acceleration (%) peak deceleration occurs (%)
5. Time before peak velocity (%) 15. Dimensionless jerk Students t-test (two-sample) was used as a filter feature
6. Time after peak velocity (%) 16. Horizontal amplitude (mm) selection method. 10 features with the highest absolute value
7. Peak acceleration (mm/s) 17. Vertical amplitude (mm) of the t-statistic and thus with most different means between
8. Time to peak acceleration (s) 18. Peak vertical ampl. time (s)
9. Time to peak acceleration (%) 19. Peak vertical ampl. time (%) two classes were selected for classification.
10. Horizontal amplitude at which 20. Horizontal ampl. of ReliefF weighs features by taking into account their inter-
peak acceleration occurs (%) peak vertical ampl. (%) actions. It uses the K-nearest neighbour method to weigh-
up features which discriminate best from the neighbours of
the different class. Thus, this method considers not only
addition, accuracy features were calculated based on the area how strongly features are related to the observed class but
bounded by points where the finger touched the horizontal also how distant they are from the opposite class. We set
surface when performing targeted movements to sticker K to 23 — half of the total sample size, and retained 10
locations (Fig. 1a) and by the average distance between all most discriminative features. ReliefF was implemented using
points (Fig. 1b). In total there were 60 features which were scikit-rebate [14].
means and standard deviations (SDs) calculated by pooling mRMR (minimum redundancymaximum relevance) is an-
movements performed with dominant and non-dominant other filter method which selects features which have the
hands. The experimental procedures involving human sub- highest relevance and at the same time lowest redundancy.
jects described in this paper were approved by the University It selects features which discriminate categories well but
of Manchester research ethics committee, ref: 2017-2541- are dissimilar to each other. Both minimum redundancy and
4204. maximum relevance criteria are based on mutual information.
Ten top ranked features were retained.
B. Data preparation
Individual outliers for each participant were removed E. Result validation
and group outliers (1.4% of all data points) were replaced In this study nested CV was used [15]. Nested CV
with group means. Outliers were identified based on non- similarly to commonly used K-fold CV approach validates
recursive procedure recommended by Van Selst and Jolicoeur the results iteratively in CV folds, using all of the available
(1994) [10]. Features were normalised by transforming to z- data for training and also reusing all of it for testing. Both
scores. validation methods thus are economical and well suited
when available data is small as is the case in this study.
C. Classification algorithm Nested CV is, however, different from K-fold CV in a
For classification Support Vector Machine (SVM) algo- significant aspect — it avoids pooling train and test data.
rithm [11] was used. It separates the classes by maximising When nested CV is performed a portion of data is split at
the gap between training examples from each class. To deal the beginning of each CV fold for testing and a model is
with non-linearly separable classes SVM uses kernel func- then developed on the reduced training set, including data
tions and penalty parameter C, which weighs the importance normalisation feature selection and parameter tuning. This
of misclassification. In this study for classification SVM with is repeated iteratively with splitting a different portion of
1422
Legend
All data
Train data
Validation Validation Validation
1 2 Test data
...
n
All data
Repeated n-times
Fig. 2. Nested validation, ACC - overall accuracy of the model, ACCi . - accuracy in a single CV fold
the data for validation, and each time developing a new TABLE II
model for training from scratch until all of the data is used P OINTING MOVEMENT AND POINTING ACCURACY RESULTS
(Fig. 2). By using the nested CV approach test data is
separate from model development and in that respect this Mean % time of Time to Overall
Duration
velocity vertical peak area
approach is similar to Train/Test Split validation. Varma (s)
(mm/s) amplitude velocity (mm2)
and Simon (2006) [8] have demonstrated that nested CV
ASC mean 0.42 775.98 0.43 0.19 30.87
produces almost unbiased performance estimates, while K- TD mean 0.37 911.96 0.47 0.17 64.01
fold CV approach, which pools train and test data, can ASC SD 0.08 142.89 0.05 0.03 16.15
produce significantly overoptimistic results. In this study 10 TD SD 0.07 212 0.06 0.03 65.75
t-statistic 2.47 2.57 2.77 2.34 2.29
fold Nested CV was used and performance of the model p-statistic 0.02 0.01 0.01 0.02 0.03
was calculated as a mean performance of ten CV folds. Cohen’s d 0.74 0.77 0.82 0.69 0.81
Nested CV was performed 100 times by randomly splitting
the data to training and testing sets to obtain performance
distributions.
A Vertical amplitude B Velocity
F. Result significance 35
1400
Result significance was assessed with permutation testing. 30
The labels of the data samples were randomly permutated 1200
Vertical amplitude, mm
1000
III. RESULTS 20
800
A. Behavioural results 15
600
Here we include results only for measures which showed 10
most prominent differences between groups (two-sample t- 400
test, Table II). ASC participants performed pointing move- 5 ASC ASC
TD TD
200
ments slower than TD participants as demonstrated by mean 0
movement duration and velocity. ASC participants reached 0% 50% 100% 0% 50% 100%
Movement progression Movement progression
peak vertical amplitude earlier in the movement and peak
velocity later in the movement compared to TD participants Fig. 3. Movement vertical amplitude (a) and velocity (b) averaged for ASC
(Fig. 3). ASC participants, however, performed pointing and TD participants. Shaded areas show the difference between groups.
movements more accurately than TD participants as indi-
cated by the average area covered by separate points to
a single target sticker location (Fig. 1a, Table II). These TABLE III
differences were significant and had medium to large effect C LASSIFICATION RESULTS
sizes as indicated by Cohen’s d. Algorithm Accuracy Sensitivity Specificity p-value
B. Classification results t-test 71% 75% 66% 0.021*
ReliefF 70% 72% 68% 0.027*
Results of SVM-RBF coupled with different feature selec- SVM-RFE 62% 65% 59% 0.113
tion methods showed that statistically significant classifica- mRMR 60% 64% 55% 0.148
tion accuracy of 71%, with a sensitivity of 75%, specificity of
1423
of more complex movement imitation tasks and preliminary
80 results are promising. In addition, to ensure the results are
75
robust we also plan to validate models with newly collected
independent data set.
70
Accuracy, %
65 R EFERENCES
60 [1] American Psychiatric Association, Diagnostic and statistical
manual of mental disorders: DSM-5 R . Washington, DC:
55 American Psychiatric Pub, 2013.
50 [2] K. A. Fournier, C. J. Hass, S. K. Naik, et al., “Motor
Coordination in Autism Spectrum Disorders: A Synthesis
45 and Meta-Analysis,” Journal of autism and developmental
disorders, vol. 40, no. 10, pp. 1227–1240, 2010.
SVM-RFE t-test ReliefF mRMR [3] M. Mari, U. Castiello, D. Marks, et al., “The reach-to-
Algorithm
grasp movement in children with autism spectrum disorder,”
Fig. 4. Accuracy distributions for four models with different feature Philosophical Transactions of the Royal Society of London
selection. Series B-Biological Sciences, vol. 358, no. 1430, pp. 393–
403, 2003.
[4] C. M. Freitag, C. Kleser, M. Schneider, et al., “Quantitative
t-test SVM-RFE
assessment of neuromotor function in adolescents with high
functioning autism and Asperger syndrome,” Journal of
15 7 13 9 autism and developmental disorders, vol. 37, no. 5, pp. 948–
TD
TD
959, 2007.
True label
ASC
TD
ASC
1424