Abstract
Emotions play an important role in common life, understanding them can improve relationships between humans and even machines. Emotion is a mental state and a reaction caused by an event based on subjective experience and can be conditioned by environment, as well as factors due to the subject itself. This fact could make recognizing emotions a difficult task because the subject can fake their reactions. Therefore, the methodologies that allow detecting the real emotions of people become a significant advance in this field. The EEG is an electrical signal that comes from the brain, and can code the internal process of a human, even the emotions, and cannot be cheated or faked. However, find patterns in this signal is a difficult task for researchers. In this work, our goal is to present a methodology for recognizing emotions, by measuring two essential scales that usually code the typical emotions, Arousal, and Valence scales. Achieved results in emotions classification show an efficient performance in this task.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Emotion recognition systems is an increasingly important research subject in communication between humans and machines for the development of technologies that allow a more natural interaction. Emotions are fundamental in the daily life of human beings as they play an essential role in human cognition, rational decision-making, perception, human interaction, and human intelligence [3].
Emotion is a mental state and an affective reaction caused by an event based on subjective experience. However, there is an explicit separation between the physiological arousal, the behavioral expression (affect), and the conscious expression of emotion (feeling) [10, 11]. Emotions play an important role in human communication and can be expressed either verbally through emotional vocabulary or by expressing nonverbal cues. However, behavioral expression as facial expressions and gestures can always be controlled voluntarily. Thus they are easy to fake or change providing unreliable information. It differs from using physiological signals such as electrocardiography (ECG), electromyography (EMG), galvanic skin response (GSR), respiration rate (RR) and, particularly, electroencephalography (EEG).
Over the last years, EEG signals analysis is the most preferred technique to analyzing the physiological expressions, due to the information that it contains, which allows differentiating emotional states, which helps researchers to a better understanding of human brain physiology and psychology [8]. Also, the analysis of these signals allows a more effective recognition of emotions because the subject under test cannot alter it. Several applications have been developed in deferents areas such as entertainment, e-learning, virtual worlds, or e-healthcare [1]
However, EEG characterization is a remaining issue depending on the application. In this case to recognize emotion requires to generate a feature set, which contains the most relevant information to recognize emotion, specifically the binary problem for arousal and Valence models. Authors in [1] propose a scheme for emotion recognition based on audio-visual stimulus and using as classifier an SVM. In [7], it was performed a scheme of feature selection and extraction, over features computed from EEG in time, frequency and time-frequency domains, and then use a quadratic discriminant analysis (QDA) as classifier. In this paper, we propose a methodology for emotion recognition algorithm using EEG signals based on Valence-arousal emotion model. The spectral and temporal features have been derived using the fast Fourier transform (FFT) over four different frequency bands theta (4–8 Hz), alpha (8–16 Hz), beta (16–32 Hz) and gamma (32–64 Hz). Mutual information with forward selection and backward elimination has been used for feature selection stage. Support vector machine has been used for classification stage and different binary classification problems were proposed: the classification of low/high arousal, low/high Valence. The ratings for each of these scales are thresholded into two classes (low and high). On the 9-point rating scales, the threshold was merely placed in the middle. The classifier was trained with user-independent data.
2 Model of Emotion
An emotion is a complex psychological state that involves three distinct components: a subjective experience, a physiological response, and a behavioral or expressive response [6]. Various discrete categorizations of emotion have been proposed. One of them is the discrete emotion model, according to Plutchik [13], there are eight basic emotions as acceptance, anger, anticipation, disgust, fear, joy, sadness and surprise. Ekman [4] exposed the relationship between facial expressions and emotions. In his theory proposes six emotions: anger, disgust, fear, happiness, sadness, and surprise. Later he expands the basic emotion by adding amusement, contempt, contentment, embarrassment, excitement, guilt, pride, relief, satisfaction, sensory pleasure and shame.
Other is the bi-dimensional emotion model by Russell [14], which is the most widely used from the dimensional perspective. This model is used to represent emotional states on a multidimensional scale spanned by Valence and arousal that can be subdivided into four quadrants, namely, low arousal/low Valence (LALV), low arousal/high Valence (LAHV), high arousal/low Valence (HALV), and high arousal/high Valence (HAHV). Valence represents the quality of emotion, can range from unpleasant (e.g., sad, stressed) to pleasant (e.g., happy, elated), whereas arousal denotes the quantitative activation level, ranges from inactive (e.g., uninterested, bored) to active (e.g., alert, excited). In this model, besides the arousal and Valence dimensions, an additional dimension called dominance is added. It ranges from a feeling of being in control during an emotional experience to a feeling of being controlled by the emotion.
3 Experimental Setup
3.1 Database
The dataset for emotion analysis using EEG, physiological and video signal, DEAP, was used in this research [9]. 32 participants took part in the experiment and their EEG and peripheral physiological signals such as electromyography (EMG), electrooculography (EOG), skin temperature, respiration pattern, blood volume pressure, and GSR, were recorded as they watched the 40 selected music videos. The 40 video clips were carefully pre-selected so that their intended arousal and Valence values span as large as possible in an area of the arousal/Valence space. Each participant watched a one-minute long music video as the visual stimuli to elicit different emotions. After each trial/video, each participant performs self-assessment and then to give continuous marks from 1 to 9 of their level of arousal, Valence, like/dislike, and dominance. Self-assessment manikins were used to visualize the scales. EEG and peripheral signals were recorded at a sampling rate of 512 Hz, but then the data was downsampled to 128 Hz, eye artifacts were removed and a high-pass filter was applied from 4–45 Hz. For further information, interested readers can refer to [9].
3.2 Feature Generation
In the design of an emotion recognition system, selection of effective features is an important step. Coan et al. [2] showed that positive emotions are associated with left frontal brain activity, whereas negative emotions are associated with right frontal brain activity. They also revealed that the decrease in activity in other brain regions such as the central, temporal and mid-frontal was less than the case in the frontal region. Therefore, only ten channels of the EEG record have been selected: F3, F4, F7, F8, FC1, FC2, FC5, FC6, FP1, and FP2.
Time Domain Features. Time domain features are computed as the natural representation of the EEG. So, we consider the time descriptor in Table 1.
Frequency Domain Features. Some of the time-frequency features are computed based on the well-known fast Fourier transform (FFT) to discriminate harmonic patterns. Here, we call a spectrum vector as \(\varXi (f)\in \mathbb {R}^H\), \(\mathbf {\Lambda }\in \mathbb {R}^H\) a frequency index vector \(\lambda _h=hF/2H\), and \(F\in \mathbb {R}\) the sampling frequency. The features computed based on FFT are consigned in Table 2.
Time-Frequency Domain Features. To find an informative representation of the EEG signal that could relate the time domain events with the frequency ones, we compute the Hilbert-Huang Spectrum (HHS) for each signal, which is done via empirical mode decomposition to arrive at intrinsic mode functions (IMFs) to represent the original signal. Also, the Discrete Wavelet Transform, which decomposes the signals in different approximation and detail levels corresponding to different frequency ranges, while conserving the time information of the signal.
Electrode Combination-Based Features. Considering the relations between the channels of the EEG, we calculated the magnitude coherence estimate as: \(C_{ij}=\frac{\left| P_{ij}\right| ^2}{P_i(f)P_j(f)}\), where the \(P_{ij}\) is the cross-power a pair of electrodes i, j and the differential asymmetry as follows: \(\varDelta \xi =\xi _l-\xi _r\) for l and r electrodes on the left/right hemisphere of the scalp, which measures these channels relations.
4 Feature Selection Based on Mutual Information
Let \(\left\{ {\mathbf {x}}_i, y_i\right\} _{i=1}^{N}\) be the training data set of a multi-class classification problem, where \({\mathbf {x}}_i\) is a \(P-\)dimensional feature vector corresponding to the instance i, and \(y_i \in \left\{ 1, \dots , C\right\} \) is the label for the instance \({\mathbf {x}}_i\). For compactness, we define the input matrix \({\mathbf {X}} = \left\{ {\mathbf {x}}_i\right\} _{i=1}^{N} \in \mathbb {R}^{N\times P}\), and \({\mathbf {y}} =\left\{ y_1, \dots , y_N\right\} {{\,\mathrm{\!\,\in \!\,}\,}}\mathbb {R}^N\) as the labels vector. Similarly, let \(\varvec{\zeta }_j \in \mathbb {R}^{N}\) be the column \(j = \left\{ 1, \ldots , P\right\} \) of the matrix \({\mathbf {X}}\). We use the criterion proposed by Peng et al. [12], called minimal-redundancy-maximal-relevance (mRMR), which is a combination of Max-Relevance (D) and min-Redundancy criteria (R). mRMR finds a set \({\mathbf {s}} {{\,\mathrm{\!\,\in \!\,}\,}}\mathbb {R}^L\) with \(L\le P\), which contains the index \(j = \left\{ 1, \ldots , P\right\} \) corresponding to the most relevant features, which jointly achieves the highest explanation for the target class \({\mathbf {y}}\). mRMR is obtained by maximizing \(\varPhi \left( \text {D},\text {R}\right) \), where \(\varPhi \) is defined as \(\varPhi \left( \text {D},\text {R}\right) =\text{ D }-\text{ R }.\) Now, the Max-Relevance D, and the minimal-Redundancy R criteria are defined as follows
where \(|{\mathbf {s}}|_{\#}\) represents the cardinality of \({\mathbf {s}}\), and \({\text {I}}({\mathbf {a}}; {\mathbf {b}})\) is the mutual information of \({\mathbf {a}}\) and \({\mathbf {b}}\). A remaining issue is how to determine the optimal number of features L. The algorithm used for this task are based on two searches: forward selection (FS) and backward elimination (BE) over the matrix \({\mathbf {Q}} {{\,\mathrm{\!\,\in \!\,}\,}}\mathbb {R}^{P\times P}\), where the element j, k are defined as \( {Q}_{j,k} = {\text {I}}\left( \varvec{\zeta }_j,\varvec{\zeta }_k; {\mathbf {y}}\right) \).
5 RUSBoost Ensemble
Given an unbalanced training data \(\left\{ {\mathbf {X}}, {\mathbf {y}}\right\} \), the RUSBoost algorithm proposed by Seiffert et al. [15] is a combination of two components: Random Under-sampling and Adaptive Boosting (AdaBoost), both used for imbalance classification. Here, we introduce a brief description of both techniques random under-sampling and AdaBoost, in order to describe RUSboost algorithm.
Random Under-Sampling: Data sampling techniques attempt to alleviate the problem of class imbalance by adjusting the class distribution of the training data set. This can be accomplished by either removing examples from the majority class.
AdaBoost: Boosting is a meta-learning technique designed to improve the classification performance of weak learners. The main idea of boosting is to iteratively create an ensemble of weak hypotheses, which are combined to predict the class of unlabeled examples. Initially, all examples in the training dataset are assigned with equal weights. During the iterations of AdaBoost, a weak hypothesis is formed by the base learner. The error associated with the hypothesis is calculated, and the wight of each example is adjusted such that misclassified examples have their wights increased while correctly classified examples have their weights decreased. Therefore, subsequent iterations of boosting will generate hypotheses that are more likely to classify the previously mislabeled examples correctly. Once all the iterations are completed, a weighted vote of all hypothesis is used to assign a class to the unlabeled examples. Since boosting assigns higher weights to misclassified examples and minority class examples are those most likely to be misclassified. The above is the reason for which minority class examples will receive higher weights during the boosting process, making it similar in many ways to cost-sensitive classification [5].
6 Results
Both sequential searches (FS and BE), show for Arousal and Valence targets, that they tow use more 1200 features, where almost entirely belong to the frequency domain and some of them to time-frequency (Fig. 1).
The Tables 3 and 4 show the performance obtained with the base learners using all the set of features (NSF), the features selected with FS and the features selected with BE. These results were obtained with a hold-out cross-validation with 70% for training and 30% for testing.
Decision Tree was the base learner who gets the highest accuracy in the classification: NSF, FS, and BE. The SVM also reach results similar to the decision tree for both, Arousal and Valence targets. However, the SVM is more unstable than the decision tree. It is important to note that both selections improved the classification considerably in the tree statistics (accuracy, sensitivity, and specificity). This highlight indicates that there are several features, which indeed may confuse a classifier. A possible reason for this the redundancy added from many of them, or maybe some of these features do not offer any information about the interest state.
From the table of results, we can conclude that Valence target is more difficult to detect than Arousal, this could be for the features computed, or even, the selected channels may not be enough for Valence recognition. Nevertheless, with the ensembles is possible to obtain results of classification above 70% of accuracy, sensitivity, and specificity. Even though, we do not perform a parameters tuning for some of the base learners, these result probes that is possible to achieve results more efficient, demonstrating that ensemble classifiers could be stronger than single classifiers.
It is important to remark, that our data only has 22 patients with more balance between classes for both scales, Arousal and Valence. Besides, we test with all the patient mixed, instead of leaving one out and training with him.
7 Conclusions
We have presented an effective strategy to classify emotions that achieves an accuracy over 70% with a feature selection stage that efficiently finds the best set of features that explains the best the target label for scales. Nevertheless, our data employment is not the standard of state-of-art methods that train with some patient and test with the remaining, our methodology proves to be an alternative, and as future work, we propose to use the same training/test structure as the state-of-art methods.
References
Ali, M., Mosa, A.H., Al Machot, F., Kyamakya, K.: EEG-based emotion recognition approach for e-healthcare applications. In: 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), pp. 946–950. IEEE (2016)
Coan, J.A., Allen, J.J.B., Harmon-Jones, E.: Voluntary facial expression and hemispheric asymmetry over the frontal cortex. Psychophysiology 38(6), 912–925 (2001)
Damasio, A.R., Sutherland, S.: Descartes’ error: emotion, reason and the human brain. Nature 372(6503), 287–287 (1994)
Ekmanm, P.: Basic Emotions. Handbook of Cognition and Emotion, pp. 45–60. Wiley, Hoboken (1999)
Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)
Hockenbury, D.H., Hockenbury, S.E.: Discovering Psychology. Macmillan, London (2010)
Jenke, R., Peer, A., Buss, M.: Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 5(3), 327–339 (2014)
Kim, M.-K., Kim, M., Oh, E., Kim, S.-P.: A review on the computational methods for emotional state estimation from the human EEG. Comput. Math. Methods Med. 2013, 13 pages (2013)
Koelstra, S., et al.: Deap: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012)
Mauss, I.B., Robinson, M.D.: Measures of emotion: a review. Cogn. Emot. 23(2), 209–237 (2009)
Nivedha, R., Brinda, M., Vasanth, D., Anvitha, M., Suma, K.V.: EEG based emotion recognition using SVM and PSO. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), pp. 1597–1600. IEEE (2017)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Plutchik, R.: The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybernet. Part A Syst. Hum. 40(1), 185–197 (2010)
Acknowledgments
We thank to the master degree program in electrical engineering at the Universidad Tecnológica de Pereira for the support and commitment with this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Valencia-Alzate, A., Castañeda-Gonzalez, J., Hoyos-Osorio, J., Daza-Santacoloma, G., Orozco-Gutierrez, A. (2019). Emotion Recognition with Ensemble Using mRMR-Based Feature Selection. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_97
Download citation
DOI: https://doi.org/10.1007/978-3-030-13469-3_97
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)