Keywords

1 Introduction

Mild cognitive impairment (MCI) is a neurological disorder that occurs in older adults involving cognitive impairments. It is often considered as the first clinical precursor of dementia such as Alzheimer’s disease (AD) when the individual exhibits lower performance on standard neuropsychological tests [1]. Recently, a few studies supported that subjective cognitive decline (SCD), which applies to the individuals with self-reported memory complaints, may be the first clinical marker of AD even before MCI [2]. It was shown to have the increased presence of AD biomarkers compared to those without SCD and be associated with a higher risk of progression to AD dementia [3]. Longitudinal studies found that SCD and MCI are associated with a similarly increased risk of AD and predicting rapid cognitive decline [4]. These findings support the idea that SCD may be an early clinical marker of AD that precedes MCI. In order to provide early intervention and delay significant impairment, identification of clinically and cognitively normal individuals who are at risk of AD dementia is very important, especially in the early stage of disease.

Magnetic resonance images (MRI) non-invasively capture the internal body structures, helping us understand the anatomical and functional brain changes related to AD [5]. Some studies have also found that hippocampal atrophy occurs before the onset of AD. A study investigated that SCD individuals have a pattern of hippocampal subfield atrophy similar to that measured in AD pathology when compared to healthy individuals without SCD [6]. The findings indicate the topographically similar changes of hippocampal subfields in SCD individuals as those found in AD. Recently, a study compared SCD with MCI and NC individuals using the volumes and asymmetries of hippocampus, amygdala and temporal horn, and to assess their relationships with cognitive function in elderly population in China [5]. In this study, significant differences (P < 0.05) were found in the volumes and asymmetries of both hippocampus and amygdala among the three groups using structural MR images.

The above studies mainly investigated the relationships between the brain atrophy and risk of dementia from SCD, MCI and potential AD through structural MRIs. However, these methods have limitations in exploring the multiple factors on the risk of dementia. With the popularity of machine learning technologies, various methods have been investigated for MR image analysis to find the relevant biomarkers in prediction and analysis of diseases [7]. In addition to the assessment of dementia conditions with sMRI, MMSE and MoCA are often used for initial screening of various types of cognitive impairment and dementia. In fact, NC group has the highest average score in both MMSE and MoCA tests, while these cognitive scores are decreased with the dementia development from SCD, SMCI to AD. Thus, it is necessary to relate the biomarkers of neuroimage to assess and predict MMSE and MoCA scores.

In this work, we investigate the multi-scale brain regions from the ROIs of whole brain to the subregions of hippocampus and amygdala to predict the MMSE and MoCA scores in the early stages of SCD and MCI. We extract three subsets of volumetric features from brain ROIs and the hippocampal and amygdala subregions. The sparse coding is then applied to identify the relevant features for each subset. Finally, the proximity-based random forest is used to combine three sets of volumetric features and establish a regression model for assessment of MMSE and MoCA scores. This study is trying to find the correlation between the volumes of the multi-scale brain regions and the dementia risk to further understand their roles in cognitive impairment and dementia risk. The remainder of this paper is organized as follows. In Sect. 2, we present the materials used in this work and the details of proposed method. Section 3 will present the experimental results and discussion. Finally, we conclude this paper in Sect. 4.

2 Materials and Methods

In this section, we introduce the data set used in this study, followed by the proposed regression method with details. Figure 1 shows the flowchart of our proposed regression framework, which consists of image acquisition and processing, feature extraction and selection, and final score regression.

Fig. 1.
figure 1

The flowchart of our proposed regression framework to integrate sparse coding and random forest models for MMSE and MoCA score predictions.

2.1 Materials and Image Processing

The data set in this study are obtained from Shanghai Mental Health Center, China. The participants were recruited from the China Longitudinal Aging Study (CLAS) of Cognitive Impairment (NCT03672448) started in 2011 [8]. This study includes 226 subjects consisting of 36 amnestic MCI, 112 SCD and 78 NC, recruited from a community-based study of individuals aged above 60 in Shanghai, China. Table 1 shows the demographic and clinical information of the studied subjects.

Table 1. Demographic and clinical information of the subjects (Mean ± standard deviation).

All T1-weighted MR brain images are segmented into 50 regions of interests (ROIs) shown in Table 2 with a fully automated pipeline of FreeSurfer 6.0.0 [9]. The ROI volumes are computed as one subset of features for regression. In addition, the cortex, GM and WM volumes of left and right hemispheres and the volumes of supra tentorial are included in this feature set. There are 57 volumes in this feature set.

Table 2. The segmented 50 ROIs of the whole brain.

Furthermore, to investigate the complex structure of hippocampus and amygdala, FreeSurfer is further used to partition these ROIs into 44 and 20 subregions, respectively, as shown in Fig. 2. The volumes are computed from these subregions as two feature sets to predict the cognitive scores.

Fig. 2.
figure 2

The segmented subregions of hippocampus and amygdala on one side of brain.

2.2 The Proposed Prediction Method

After segmentation, three subsets of volume features are obtained from the ROIs and the subregions of hippocampal and amygdala to predict the MMSE and MoCA scores. Our proposed method can identify the most relevant features for each subset of features, followed by random forest regression for prediction of clinical scores.

First, sparse coding is used to select the most relevant features for each subset which considers the combination of features over different brain regions to handle the multivariate interactions. Let y denote the clinical scores of training data; \( {\mathbf{\rm A}} \) represent the feature matrix of \( M \times N \) for M participants; \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\omega } = \left( {\omega_{1} ,\omega_{2} , \ldots ,\omega_{N} } \right)^{T} \) is the coefficient vector assigned to the N features. An \( L1 \)-regularized sparsity could be imposed on the coefficients to choose the relevant features for regression. The \( L1 \)-regularized least square problem can be formulated as:

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\omega } = {\text{argmin}}_{\omega } \left\| {y - {\mathbf{\rm A}}\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\omega } } \right\|_{2}^{2} + \gamma \left\| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\omega } } \right\|_{1} ,\,\,\,s.t. \,\,\,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\omega }_{i} \ge 0,\,\forall i $$
(1)

where γ is the sparsity regularization parameter which controls the amount of zero coefficients in \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\omega } \). The non-zero elements in \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\omega } \) indicate that the corresponding features are relevant to the regression. The grid search can be used to obtain the optimal sparsity value through cross-validation on the training samples.

Second, random forest [10, 11] is used to compute the proximity measures and make the score regression with the selected features. It can also report the importance of features for each subset. For regression task, decision trees act as regression trees. During the growth of a tree, each node is determined by finding a feature that minimizes the difference between the left and right subset predicting errors. When the predicting error is below a threshold, the node stops splitting as a terminal node. The feature importance can be calculated with the difference between the left and right subset predicting errors. Each weight value is normalized between 0–1. After training, the random forest generates proximity measures showing the probability that two subjects fall into the same leaf node in the regression results of all T trees. Our method has a double feature selection to better explore the relevant features for prediction.

Finally, after 3 individual random forest models are trained to predict the scores with three subsets of features, their proximity matrices are linearly combined into a final proximity matrix as:

$$ {\text{P}} = w_{1} P_{1} + w_{2} P_{2} + \left( {1 - w_{1} - w_{2} } \right)P_{3} $$
(2)

where P denotes the final proximity matrix and \( \omega_{1} ,\omega_{2} \) are the weights assigned to the corresponding subsets of features. The composite proximity matrix P is input to the random forest model to combine three subsets of features for prediction of scores.

3 Experimental Results

3.1 Datasets and Implementation

The data used in our experiments are from 226 subjects as detailed in Sect. 2.1. In our experiments, the OOB error is converged to stable when \( nTree\,{ \gtrsim }\,500 \) and the optimal number of trees in the forest \( nTree = 1000 \). The weighting parameters \( {\text{w}}_{1} ,{\text{w}}_{2} \) were optimized via grid search in training process to obtain the best performance of random forest regression. The 10-fold cross-validation is used to evaluate the proposed method. It is repeated 10 times and the final result is obtained by averaging 10 test predictions to reduce the chance of experimental results. To evaluate the prediction performance, we compute the mean squared error (MSE) and the mean absolute error (MAE) between the actual and estimated MMSE and MoCA scores by averaging the results of ten tests. In addition, the Pearson’s correlation coefficient (CORR) is used to evaluate the power of regression line in data representation.

3.2 Results on Prediction of Cognitive Scores

The first experiment is to test the effects of different subsets of features on the MMSE and MoCA prediction. We also compare the results by using the t-test and sparse coding for feature selection. As for sparse coding, features from 3 subsets are selected separately to get more precise proximity matrix. As for t-test, two groups of data are divided according to the level of scores to select features. The predicting results by using different features and their combinations are listed in Tables 3 and 4, respectively. From the results, we can see the volume features from the subregions of Hippocampus and Amygdala achieve better performances than ROI features. The sparse coding performs better than the t-test. Specifically, the proposed combination achieves the highest correlation coefficients of 0.469 and 0.436.

Table 3. The performances comparison for prediction of MMSE scores using different features
Table 4. The performances comparison for prediction of MoCA scores using different features

The second experiment is to test the effects of the weighted combination (WC) of the proximity matrices for fusing three subsets of features on prediction performances. One direct method is to concatenate the selected features from different subsets as the input of regression model. Table 5 shows the prediction performances and the corresponding scatter plots are shown in Fig. 3. We can see that the proposed weighted combination performs better than the concatenating method.

Table 5. Performance comparison for prediction of clinical scores with different combinations
Fig. 3.
figure 3

The prediction results of (a) MMSE and (b) MoCA scores by feature concatenating, as well as the prediction results of (c) MMSE and (d) MoCA scores by weighted combination.

3.3 Biomarkers Relevant to the Predictions of Cognitive Scores

In this section, we investigate the relevant biomarkers for disease interpretation. We computed the number of times that the features were selected out of 10 folds and denoted as frequency. The features with frequency higher than 8 were selected as the relevant biomarkers for each partition. Our study found that hippocampus atrophy in the right hemisphere has a higher weight than the left on the scores while the amygdala is just the opposite. The hippocampal fimbria shows the highest weight among all ROIs, with right fimbria showing higher weight than the left. The results indicate that the commonly selected top regions are consistent to the AD pathology studies [5, 6, 12].

4 Conclusion

In this paper, we have proposed a combined regression framework based on sparse coding and random forest for prediction of MMSE and MoCA scores. It enables MRI diagnostic analysis of the SCD group, which is rarely involved in current research. Three sets of volumetric features are extracted from the ROIs of whole brain and the subregions of hippocampus and amygdala. Sparse coding is applied to select the relevant features to clinical score estimation. As for brain ROIs, the paper subdivided the subregions on the basis of the hippocampus and the amygdala. By comparison with the whole brain, it is proved that the amygdala is more closely associated with clinical scores, followed by hippocampus. These results are also consistent with relative clinical experiments, achieving computer-aided diagnosis and prediction of AD process through the calculation and analysis of brain MRI.