Electronics 12 02933 v2
Electronics 12 02933 v2
Article
Physiological Signal-Based Real-Time Emotion Recognition
Based on Exploiting Mutual Information with Physiologically
Common Features
Ean-Gyu Han 1 , Tae-Koo Kang 2, * and Myo-Taeg Lim 1, *
1 School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea; wlrdmlrja@korea.ac.kr
2 Department of Human Intelligence and Robot Engineering, Sangmyung University,
Cheonan 31066, Republic of Korea
* Correspondence: tkkang@smu.ac.kr (T.-K.K.); mlim@korea.ac.kr (M.-T.L.)
Abstract: This paper proposes a real-time emotion recognition system that utilizes photoplethysmog-
raphy (PPG) and electromyography (EMG) physiological signals. The proposed approach employs a
complex-valued neural network to extract common features from the physiological signals, enabling
successful emotion recognition without interference. The system comprises three stages: single-pulse
extraction, a physiological coherence feature module, and a physiological common feature module.
The experimental results demonstrate that the proposed method surpasses alternative approaches
in terms of accuracy and the recognition interval. By extracting common features of the PPG and
EMG signals, this approach achieves effective emotion recognition without mutual interference. The
findings provide a significant advancement in real-time emotion analysis and offer a clear and concise
framework for understanding individuals’ emotional states using physiological signals.
Keywords: emotion recognition; physiological signal; PPG; EMG; multimodal network; convolutional
autoencoder; short-time Fourier transform (STFT); complex-valued convolutional neural network
(CVCNN)
signals to recognize emotions rather than EEG, as they contain extensive emotion-specific
information and can be incorporated into wearable devices practically [16–18]. Thus, they
are easily measurable and somewhat less complex to analyze compared to EEG signals.
Therefore, in this study, we paid attention to emotion recognition using a deep learning
model based on PPG and EMG signals.
Psychologists and engineers have attempted to analyze these data to explain and cate-
gorize emotions. Although there are strong relationships between physiological signals and
human emotional states, traditional manual feature extraction suffers from fundamental
limitations to describe emotion-related characteristics from physiological signals.
1. Hand-crafted feature performance largely depends on the signal type and level of
experience. Hence, poor domain knowledge can result in inappropriate features that
are unable to capture some signal characteristics.
2. There is no guarantee that any given feature selection algorithm will extract the
optimal feature set.
3. Moreover, most manual features are statistical and cannot incorporate signal details,
which results in information loss.
In contrast, deep learning can automatically derive features from raw signals, allowing
automatic feature selection and the bypassing of feature selection computational costs,
and is applied to many industrial fields [19,20]. Similarly, deep learning methods have
been recently applied to processing physiological signals (such as EEG or skin resistance),
achieving comparable results with conventional methods. Martinez et al. [21] were the
first to propose CNNs to establish physiological models for emotion, resulting in many
subsequent deep emotion recognition studies. While deep learning has these advantages,
features with conflicting information can disturb the process of recognizing emotions.
Following the above-mentioned research and limitations, the research problem is
described as follows. First, there are many problems in using EEG data in real-time emotion
recognition. Second, traditional manual feature extraction does not guarantee an optimal
feature set, leading to data loss. Finally, if there is a feature with conflicting information,
it can interfere with emotion recognition [22]. Therefore, in this work, we select PPG and
EMG signals and propose a deep learning model that prevents feature interference by
extracting the common features of both signals.
This study is structured as follows. Section 2 describes the overall structure of the
proposed system, including the method of splitting the PPG and EMG signals into a single
pulse. In Section 3, the experimental environment regarding the dataset and experimental
settings and the experimental results are presented, and the performance of the proposed
emotion recognition model is compared with other studies. Finally, Section 4 contains a
summary of the paper and presents the conclusions.
As shown in Figure 1, the proposed system mainly comprises two modules. The phys-
iological coherence feature module extracts features that exhibit a correlation between the
PPG and EMG signals using a convolutional autoencoder and a two-stream CNN. Further-
more, the physiological common feature module extracts features that share both frequency
information and overall details over time using a short-time Fourier transform (STFT) and
a CVCNN. This module can contribute to successful emotion recognition by preventing
feature interference that may occur in the physiological coherence feature module.
2.2. Single-Pulse Extraction Using Peak-to-Peak Segmentation for PPG and EMG Signals
By including the time or frequency domain and a geometric analysis, there is a variety
of different physiological signal analysis techniques. The most commonly used method is a
time domain analysis, which is divided by the average cycle rate and the difference between
the longest and shortest signal values. However, preprocessing at an average cycle rate is
inefficient because the aim is to capture changing trends immediately. Here, the difference
between the longest and shortest signals is irrelevant, because the data fundamentally differ
between participants. Therefore, the captured signal was split into short periods based
on the peak value to extract the maximum amount of information within the raw signal
while minimizing any losses. These short periods of signals are often directly associated
with underlying physiological properties. Introducing even small variations in these short
periods could potentially distort the underlying properties. As a result, in order to preserve
the integrity of the signals and avoid any potential distortion of the underlying properties,
we chose not to apply any signal augmentation techniques.
Figure 2 indicates that the PPG high peaks and EMG low peaks were clearly distin-
guishable from the characteristic waveforms. However, full-length signals were difficult to
correlate with specific emotions, since emotional expressions weaken or deteriorate with
increasing measurement time. Therefore, we segmented the signals into short signals to
reflect emotion trends and eliminated any signals that differed from this trend. Regular
periodicity signals were divided into single-pulse sections. Comparing the PPG and EMG
data, we set the single-pulse data length to 86 sample points, although segmenting this
length differed depending on the particular signal characteristics.
(
PPGsingle-pulse = [ x ∗H p − c L , x ∗H p + c R ]
Segmentation criteria = (1)
EMGsingle-pulse = [ x ∗Lp − c L , x ∗Lp + c R ]
where x∗ denotes the partial (single-pulse) signal length extracted from the entire signal;
H p and L p are high and low peak locations, respectively; and c L and cR are the left and right
constants, respectively, which were assigned to the relative to the peak points. Figure 3
displays typical resulting extracted signals.
Electronics 2023, 12, 2933 5 of 20
Using the entire signal does not always help in recognizing emotions. Rather, the
signals typically contain artifact noise which distorts the signal waveform and complicates
the fitting task. Using the entire signal is also rather complicated because we have to
consider the possibility of each emotion starting in an arbitrary time frame [23–25]. In order
to efficiently recognize emotions, it was necessary to determine the appropriate length to
input to the deep learning model after properly segmenting the signal. Therefore, as a result
of exploring the appropriate input signal length through experiments, we found that the
appropriate input signal length is between 10 and 15 pulses. Furthermore, normalization is
essential when processing data that vary from person to person, such as biosignals.
The maximum or minimum value of the signal (the amplitude) is different for each
person. Therefore, to find appropriate peak values, appropriate threshold values must be
determined. For this purpose, a quartile analysis was applied to all peak values.
A quartile analysis is a statistical method used to divide a set of data into four equal
parts (quartiles). The data are sorted in ascending order, and then three equally sized
cut points are selected that divide the data into four groups, with each group containing
an equal number of observations. These cut points are known as quartiles and are often
denoted as Q1, Q2, (median), and Q3. A quartile analysis is useful for understanding the
distribution of a set of data, particularly when the data contain outliers or are not normally
distributed. Moreover, it can provide information on the spread, skewness, and central
tendency of the data.
Using this method, the threshold value that can obtain the maximum information
without losses was 0.15 for PPG and 1.2 for EMG. Figure 4 shows the single-pulse appear-
ance of PPG and EMG when various thresholds (including appropriate threshold values)
are applied.
Electronics 2023, 12, 2933 6 of 20
The CAE combines the local convolution connection with the autoencoder, which is a
simple step that adds a convolution operation to inputs. Correspondingly, a CAE consists of
a convolutional encoder and a convolutional decoder. The convolutional encoder realizes
the process of convolutional conversion from the input to the feature maps, while the
convolutional decoder implements the convolutional conversion from the feature maps to
the output. In a CAE, the extracted features and the reconstructed output are calculated
through the CNN. Thus, (2) and (3) can be rewritten as follows:
where, ω represents the convolutional kernel between the input and the code y and ω’
represents the convolutional kernel between the code y and the output. Terms b and b’
are the bias. Moreover, the parameters of the encoding and decoding operations can be
computed using unsupervised greedy training. The proposed architecture of a CAE for 1D
signals is shown in Figure 6.
Figure 5. General structure of an autoencoder in which the encoder and decoder are neural networks.
2.3.2. Feature Extraction of Physiological Coherence Features for PPG and EMG Signals
In the previous section, each latent vector of the PPG and EMG signals was extracted
through the CAE. Data compression of the signal was achieved through a dimensionality
reduction, which is the main role of the autoencoder, allowing essential information about
the signals to be extracted. To extract the physiological coherence features through this
latent vector, a feature extraction module was constructed, as depicted in Figure 7. In the
physiological coherence feature module, starting with latent vectors for each PPG and EMG
signal, emotion-related features were extracted using the following process. Moreover, the
features extracted in this way are complementary to PPG and EMG and contain information
about the overall details over time.
First, effective features related to emotions were obtained through a 1D convolutional
layer in each latent vector of the PPG and EMG signals. Complementary features of the PPG
and EMG signals were then extracted by concatenating each feature and passing through
the 1D convolutional layer again. When extracting features from the PPG and EMG signals,
after the first 1D convolutional layer of each process, batch normalization and max pooling
were performed to solve the problem of internal covariate shift and to transfer strong
features with emotional characteristics to the next layer. However, while performing max
pooling and passing strong features with emotional characteristics to the next layer, delicate
representations may be discarded which can capture sophisticated emotional information.
Therefore, only batch normalization was performed at the second convolutional layer. In a
situation where features are fused through concatenation, this could be performed after
Electronics 2023, 12, 2933 8 of 20
arranging features in a row through flattening, as shown in Figure 8a. However, we did
not employ flattening to preserve the overall details over time (also known as temporal
information). Instead, temporal-wise concatenation was performed to ensure that it could
be fused by time steps, as shown in Figure 8b.
(a) Feature fusion using flattening (b) Feature fusion without flattening
2.4.1. Signal Conversion from Time Domain to Time-Frequency Domain Using the
Short-Time Fourier Transform
The STFT is a Fourier-related transform used to determine the sinusoidal frequency
and phase content of local sections of a signal as it changes over time [29]. Although
the fast Fourier transform (FFT) can clearly indicate the frequency of a signal, it has the
disadvantage of having difficulty determining how much the frequency has changed over
time. In contrast, the STFT can easily determine the frequency over time because it performs
a Fourier transform by dividing the section over time.
The process of the STFT is depicted in Figure 9. Here, the hop length is the length that
the window jumps from the current section to the next section, and the overlap length is
the overlapping length between the current window and the next window. The resulting
value of the STFT comprises a complex number that contains information on both the
magnitude and phase. Therefore, the use of complex numbers is inevitable when using
both the magnitude and phase information of the STFT as shown in Figure 10.
Figure 10. STFT process for the physiological common feature module.
The CVN has the same structure as an RVN, as depicted in Figure 11. However, the
weights, inputs, and outputs of CVNs all exist in the complex domain. Therefore, they can
be applied to various fields that use a complex system [30,31].
In the complex generalization, both the kernel and input patch are complex values. The
only difference stems from the nature of multiplying complex numbers. When convolving
a complex matrix with the kernel W = A + iB, the output corresponding to the input patch
Z = X + iY is given by
Z · W = ( X · A − Y · B) + i( X · B + Y · A) (7)
To implement the same functionality with a real-valued convolution, the input and
output should be equivalent. Each complex matrix is represented by two real matrices
stacked together in a three-dimensional array. Denoting this array [X, Y], it is equivalent to
X + iY. X and Y are the array’s channels (Figure 12).
2.4.3. Feature Extraction of Physiological Common Features for PPG and EMG Signals
As mentioned at the beginning of this chapter, there are shortcomings in the method
of extracting each feature. Therefore, in this section, to address these shortcomings, features
were extracted while preserving the general details over time of the signals and the infor-
Electronics 2023, 12, 2933 11 of 20
mation of the frequency band by applying the STFT and CVNN, as explained previously.
Figure 13 shows the structure of the proposed physiological common feature module.
As shown in Figure 13, the common features of the two biosignals (PPG and EMG)
were extracted in this study, because extracting the features of PPG and EMG separately is
an inherently inefficient method. In other words, selecting individual features provides too
much input data in single-task learning and creates the possibility that each feature would
adversely affect other features and interfere with the task to be performed. In addition, the
resultant value of the STFT is composed of complex numbers and includes information
on both magnitude and phase. Therefore, to use both the intensity and phase information
of the STFT, the use of complex numbers is inevitable. Accordingly, a CVNN was used to
extract the features.
For the following reasons, we propose to extract the common features of the PPG
and EMG signals through a CVNN instead of extracting the features for each PPG and
EMG signal.
Therefore, the total structure of our proposed real-time emotion recognition system
can be represented as shown in Figure 14.
Electronics 2023, 12, 2933 12 of 20
3. Experimental Results
3.1. Datasets
It is important to decide which dataset to use for a study since the type and characteris-
tics of a dataset have a significant influence on the results. In particular, datasets containing
only physiological signals (not image-generated datasets) are required for emotion recogni-
tion through physiological signals. We required a dataset containing PPG and EMG signals;
thus, among the available datasets, we chose the DEAP dataset [12]. Moreover, we created
a dataset, EDPE, for more granular emotions (as used in a previous study [32]).
Emotions can be affected by many factors, and each emotion has fuzzy boundaries.
Therefore, it is ambiguous to quantify emotions or define them using objective criteria. Var-
ious models that define emotion have been developed, although most emotion recognition
studies use Russell’s circumplex theory [33], which assumes emotions are distributed in a
two-dimensional circular space with arousal and valence dimensions. Generally, arousal
is considered as the vertical axis and valence the horizontal, with the origin (circle center)
representing neutral valence and medium arousal level.
As shown in Figure 15, emotional states can be represented at any valence and arousal
level. For example, “Excited” has high arousal and high valence, whereas “Depressed” has
low arousal and low valence. Emotions can manifest in various ways, and current emotion
recognition systems are generally based on facial expressions, voice, gestures, and text.
9, except for familiarity (which was a discrete scale from 1 to 5). Thirty-two participants
first put on a device that can collect the signals and started the device three seconds before
watching the video to measure the signals when they were in a calm state. After that, they
watched the videos and started the self-assessment after the video was finished. This step
was repeated to collect the signals. The signals were measured at 512 Hz and also the data
were downsampled to 128 Hz. Furthermore, the dataset is summarized in Table 1.
these four-step self-assessments, emotions are classified into 16 areas expressed in Figure 16,
not four areas. This makes it more efficient to recognize emotions at the level of emotions
defined by adjectives. The overall experimental process is as follows. First, participants
attach a sensor and wait in a normal state for 10 min without measuring the signals. After
that, they watch videos corresponding to the four quadrants of Russell’s model while the
signals are measured. After each videos finishes, they start a self-assessment. The measured
signals are PPG and EMG, which are sampled and measured at 100 Hz, as summarized in
Table 2.
Table 3 shows a comparison of the results of the emotion recognition using the DEAP
dataset. The proposed method exhibited an accuracy of 75.76% and 74.32% in arousal and
valence. Except for studies [10,34,35], it can be seen that the proposed method shows the
best performance and is also superior in terms of the recognition interval. Compared to the
proposed method, the study [34] performs poorly in valence, but outperforms in arousal.
Conversely, study [10] performs well in valence, but does not perform well in arousal. In
the case of study [35], it can be seen that both valence and arousal perform better than the
proposed method. However, it is difficult to make an appropriate comparison because the
three studies have a longer recognition interval than the proposed method.
Accuracy
Method Recognition Interval Signals
Arousal Valence
Naïve Bayes with
GSR, RSP, SKT,
Statistical Features 63 s 57% 62.7%
PPG, EMG, EOG
(Koelstra, 2011) [12]
CNN
30 s BVP, SC 69.1% 63.3%
(Martinez, 2013) [21]
DBN
60 s EEG 69.8% 66.9%
(Xu, 2016) [36]
Deep Sparse AE
20 s RSP 80.78% 73.06%
(Zhang, 2017) [34]
MEMD
60 s EEG 75% 72.87%
(Mert, 2018) [37]
SAE-LSTM
60 s EEG 74.38% 81.1%
(Xing, 2019) [10]
HOLO-FM
60 s EEG 77.72% 76.61%
(Topic, 2021) [35]
Proposed Method 15 s PPG, EMG 75.76% 74.32%
Electronics 2023, 12, 2933 16 of 20
Table 4. Re-comparison with the top-3 studies in Table 3 (with the recognition interval set to 15 s).
Accuracy
Method Recognition Interval Signals
Arousal Valence
Deep Sparse AE
15 s RSP 69.8% 70.67%
(Zhang, 2017) [34]
SAE-LSTM
15 s EEG 54.46% 50.98%
(Xing, 2019) [10]
HOLO-FM
15 s EEG 70.54% 72.32%
(Topic, 2021) [35]
Proposed Method 15 s PPG, EMG 75.76% 74.32%
Figure 18. Classification accuracy of the EDPE dataset according to pulse length.
Figure 19 shows the confusion matrix of the arousal and valence results when the ideal
number of pulses in the EDPE dataset was set to 10. Figure 19 exhibits that many num-
bers were on the descending right diagonal where the predictions and answers matched,
indicating that the learning was successful. Items with relatively high numbers (except
for numbers on the diagonal) were <Very High–High> and <Very Low–Low>, and Very
High became High, High became Very High, or Very Low became Low and Low (which
refers to the case of confusion with Very Low). Even though there were cases of confusion,
the overwhelming majority of correctly predicted cases proved the excellent classification
performance of the method proposed in this study.
Experiments were also conducted with various deep learning models based on a
CNN and LSTM (commonly used in deep learning models) using the same EDPE dataset.
Although CNNs are one of the most-used deep neural networks for analyzing visual images,
they have frequently been employed in recent emotion recognition research by analyzing
patterns of adjacent physiological signals. Therefore, we compared the performance of
CNNs and models that combined a stacked autoencoder and a CNN or LSTM. Finally,
Electronics 2023, 12, 2933 17 of 20
the performance of the bimodal stacked sparse autoencoder [32] was compared. Table 5
summarizes the experimental results of emotion recognition.
As shown in Table 5, the performance was low when recognizing emotions using
LSTM. This result indicated that the data were not just time dependent but also more
complex. Therefore, this suggested that improved results could be obtained by analyzing
data patterns using a fully connected layer and a convolutional layer. As a result, our
proposed model outperformed the other deep learning models.
Accuracy
Model Dataset Recognition Interval
Arousal Valence
CNN 70.24% 74.34%
Stacked Auto-encoder + CNN 71.47% 72.01%
EDPE
Stacked Auto-encoder + LSTM 10 s 61.03% 59.25%
dataset
Bimodal-Stacked Auto-encoder [32] 75.86% 80.18%
Proposed Model 84.84% 86.50%
Recognizing the highs and lows of arousal and valence has a very different meaning
from recognizing emotion itself. Being able to recognize arousal well does not necessarily
mean that valence can also be recognized well, and vice versa. In other words, recognizing
emotions is a more complicated and difficult problem than recognizing high and low
levels of arousal or valence, in which both arousal and valence standards are applied
simultaneously, as shown in Figure 16. Therefore, to recognize emotions, we reconstructed
the EDPE dataset with data and labels for each of the 16 emotions in Figure 16, and
training and testing were conducted by dividing the sample into 80% for training and 20%
for testing.
Table 6 presents the recognition results for the 16 emotions, which displayed an
average recognition accuracy of 82.52%. Although this result was slightly lower than the
recognition accuracy for arousal and valence, it was sufficiently accurate to be applied
successfully in real-life scenarios, considering the difficulty of recognizing 16 emotions
compared to each recognition task for arousal and valence.
Table 6. Results of emotion recognition for sixteen emotions by the proposed model.
Quadrant Emotions
Astonished Convinced Excited Delighted
Quadrant I
(HVHA) 85.37% 87.09% 81.34% 80.20%
Electronics 2023, 12, 2933 18 of 20
Table 6. Cont.
Quadrant Emotions
Distress Disgust Annoyed Impatient
Quadrant II
(LVHA) 78.35% 80.97% 75.26% 82.97%
Sad Anxious Worried Bored
Quadrant III
(LVLA) 79.61% 82.89% 77.49% 90.04%
Confident Serious Pleased Calm
Quadrant IV
(HVLA) 85.08% 83.24% 86.87% 83.40%
HVHA: High Valence High Arousal, LVLA: Low Valence Low Arousal, LVHA: Low Valence High Arousal, HVLA:
High Valence Low Arousal.
4. Conclusions
This paper proposed a novel approach for real-time emotion recognition using physio-
logical signals (PPG and EMG) through the extraction of physiologically common features
via a CVCNN. The results indicated that the proposed approach achieved an accuracy of
81.78%, which is competitive with existing methods. Furthermore, we confirmed that the
recognition interval was significantly shorter than in other studies, rendering the proposed
method suitable for real-time emotion recognition.
The findings of this study suggest that the proposed approach has the potential
to be applied in various fields, such as healthcare, human–computer interactions, and
affective computing. Moreover, this study provides insights into the relationship between
physiological signals and emotions, which can further advance our understanding of the
human affective system.
While the proposed approach shows promise in real-time emotion recognition using
physiological signals, there are some limitations. Firstly, the concept of cross-subject
analysis, which involves analyzing data from multiple subjects, is not incorporated in this
study. This limits the generalizability of the findings to a broader population. Next, the
experiments were conducted in a controlled laboratory setting, which may not fully capture
the range of emotions experienced in real-life situations. Therefore, there is a need for
future research to address these limitations.
In light of these limitations, future research should consider conducting experiments
in wild environments to better understand the applicability of the proposed approach in
real-world scenarios. This would provide a more comprehensive understanding of how
emotions manifest in different contexts. In addition, by understanding the properties of
the matrix, which is the result of a STFT, it is possible to derive novel approaches such as
spectrograms [38] or graph transformer models [39,40]. Furthermore, it is important to
expand the scope of the investigation beyond short-term emotion recognition. Long-term
emotion recognition should be explored to gain insights into how emotions evolve and
fluctuate over extended periods of time.
Moreover, future research could focus on defining and recognizing personality traits
based on changes in emotions. By studying the relationship between emotions and person-
ality, we can gain a deeper understanding of the human affective system. This would not
only contribute to the field of affective computing, but also have practical implications in
various domains, such as healthcare and human–computer interactions.
In summary, by addressing the limitations related to cross-subject analysis and con-
ducting experiments in real-life settings, future research can enhance the applicability and
generalizability of the proposed approach. Additionally, exploring long-term emotion
recognition and its connection to personality traits would provide valuable insights into
the complex nature of human emotions.
Electronics 2023, 12, 2933 19 of 20
Author Contributions: Conceptualization, E.-G.H., T.-K.K. and M.-T.L.; data curation, E.-G.H.;
formal analysis, E.-G.H., T.-K.K. and M.-T.L.; methodology, E.-G.H., T.-K.K. and M.-T.L.; software,
E.-G.H. and T.-K.K.; validation, M.-T.L. and T.-K.K.; writing—original draft, E.-G.H. and T.-K.K.;
writing—review and editing, M.-T.L. and T.-K.K. All authors have read and agreed to the published
version of the manuscript.
Funding: This research was supported by the Basic Science Research Program through the National
Research Foundation of Korea (NRF) (grant no. NRF-2022R1F1A1073543).
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ali, M.; Mosa, A.H.; Al Machot, F.; Kyamakya, K. Emotion recognition involving physiological and speech signals: A comprehensive
review. In Recent Advances in Nonlinear Dynamics and Synchronization; Springer: Berlin/Heidelberg, Germany, 2018; pp. 287–302.
2. Sim, H.; Lee, W.H.; Kim, J.Y. A Study on Emotion Classification utilizing Bio-Signal (PPG, GSR, RESP). Adv. Sci. Technol. Lett.
2015, 87, 73–77.
3. Chen, J.; Hu, B.; Moore, P.; Zhang, X.; Ma, X. Electroencephalogram-based emotion assessment system using ontology and data
mining techniques. Appl. Soft Comput. 2015, 30, 663–674. [CrossRef]
4. Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A review of emotion recognition using physiological signals.
Sensors 2018, 18, 2074. [CrossRef] [PubMed]
5. Houssein, E.H.; Hammad, A.; Ali, A.A. Human emotion recognition from EEG-based brain–computer interface using machine
learning: A comprehensive review. Neural Comput. Appl. 2022, 34, 12527–12557. [CrossRef]
6. Al-Qazzaz, N.K.; Alyasseri, Z.A.A.; Abdulkareem, K.H.; Ali, N.S.; Al-Mhiqani, M.N.; Guger, C. EEG feature fusion for motor
imagery: A new robust framework towards stroke patients rehabilitation. Comput. Biol. Med. 2021, 137, 104799. [CrossRef]
[PubMed]
7. Sung, W.T.; Chen, J.H.; Chang, K.W. Study on a real-time BEAM system for diagnosis assistance based on a system on chips
design. Sensors 2013, 13, 6552–6577. [CrossRef]
8. Wen, T.; Zhang, Z. Deep convolution neural network and autoencoders-based unsupervised feature learning of EEG signals.
IEEE Access 2018, 6, 25399–25410. [CrossRef]
9. Alhagry, S.; Fahmy, A.A.; El-Khoribi, R.A. Emotion recognition based on EEG using LSTM recurrent neural network. Int. J. Adv.
Comput. Sci. Appl. 2017, 8, 355–358. [CrossRef]
10. Xing, X.; Li, Z.; Xu, T.; Shu, L.; Hu, B.; Xu, X. SAE + LSTM: A New framework for emotion recognition from multi-channel EEG.
Front. Neurorobot. 2019, 13, 37. [CrossRef]
11. Soleymani, M.; Lichtenauer, J.; Pun, T.; Pantic, M. A multimodal database for affect recognition and implicit tagging. IEEE Trans.
Affect. Comput. 2011, 3, 42–55. [CrossRef]
12. Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for
emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [CrossRef]
13. Zheng, W.L.; Zhu, J.Y.; Lu, B.L. Identifying stable patterns over time for emotion recognition from EEG. IEEE Trans. Affect.
Comput. 2017, 10, 417–429. [CrossRef]
14. Lin, Y.P.; Wang, C.H.; Jung, T.P.; Wu, T.L.; Jeng, S.K.; Duann, J.R.; Chen, J.H. EEG-based emotion recognition in music listening.
IEEE Trans. Biomed. Eng. 2010, 57, 1798–1806. [PubMed]
15. Chanel, G.; Kronegg, J.; Grandjean, D.; Pun, T. Emotion assessment: Arousal evaluation using EEG’s and peripheral physiological
signals. In Proceedings of the Multimedia Content Representation, Classification and Security: International Workshop, MRCS
2006, Istanbul, Turkey, 11–13 September 2006; Proceedings; Springer: Berlin/Heidelberg, Germany, 2006; pp. 530–537.
16. Udovičić, G.; Ðerek, J.; Russo, M.; Sikora, M. Wearable emotion recognition system based on GSR and PPG signals. In Proceedings
of the 2nd International Workshop on Multimedia for Personal Health and Health Care, Mountain View, CA, USA, 23 October
2017; pp. 53–59.
17. Li, C.; Xu, C.; Feng, Z. Analysis of physiological for emotion recognition with the IRS model. Neurocomputing 2016, 178, 103–111.
[CrossRef]
18. Lee, Y.K.; Kwon, O.W.; Shin, H.S.; Jo, J.; Lee, Y. Noise reduction of PPG signals using a particle filter for robust emotion recognition.
In Proceedings of the 2011 IEEE International Conference on Consumer Electronics—Berlin (ICCE—Berlin), Berlin, Germany, 3–6
September 2011; pp. 202–205.
19. Noroznia, H.; Gandomkar, M.; Nikoukar, J.; Aranizadeh, A.; Mirmozaffari, M. A Novel Pipeline Age Evaluation: Considering
Overall Condition Index and Neural Network Based on Measured Data. Mach. Learn. Knowl. Extr. 2023, 5, 252–268. [CrossRef]
20. Mirmozaffari, M.; Yazdani, M.; Boskabadi, A.; Ahady Dolatsara, H.; Kabirifar, K.; Amiri Golilarz, N. A novel machine learning
approach combined with optimization models for eco-efficiency evaluation. Appl. Sci. 2020, 10, 5210. [CrossRef]
21. Martinez, H.P.; Bengio, Y.; Yannakakis, G.N. Learning deep physiological models of affect. IEEE Comput. Intell. Mag. 2013,
8, 20–33. [CrossRef]
Electronics 2023, 12, 2933 20 of 20
22. Ozbulak, U.; Gasparyan, M.; Rao, S.; De Neve, W.; Van Messem, A. Exact Feature Collisions in Neural Networks. arXiv 2022,
arXiv:2205.15763.
23. Wu, C.K.; Chung, P.C.; Wang, C.J. Representative segment-based emotion analysis and classification with automatic respiration
signal segmentation. IEEE Trans. Affect. Comput. 2012, 3, 482–495. [CrossRef]
24. Picard, R.W.; Vyzas, E.; Healey, J. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Trans.
Pattern Anal. Mach. Intell. 2001, 23, 1175–1191. [CrossRef]
25. Zeng, Z.; Pantic, M.; Roisman, G.I.; Huang, T.S. A survey of affect recognition methods: Audio, visual and spontaneous expressions.
In Proceedings of the 9th International Conference on Multimodal Interfaces, Aichi, Japan, 12–15 April 2007; pp. 126–133.
26. Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In
Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2011: 21st International Conference on Artificial
Neural Networks, Espoo, Finland, 14–17 June 2011; Proceedings, Part I 21; Springer: Berlin/Heidelberg, Germany, 2011; pp.
52–59.
27. Wang, Y.; Xie, Z.; Xu, K.; Dou, Y.; Lei, Y. An efficient and effective convolutional auto-encoder extreme learning machine network
for 3d feature learning. Neurocomputing 2016, 174, 988–998. [CrossRef]
28. Huang, H.; Hu, X.; Zhao, Y.; Makkie, M.; Dong, Q.; Zhao, S.; Guo, L.; Liu, T. Modeling task fMRI data via deep convolutional
autoencoder. IEEE Trans. Med. Imaging 2017, 37, 1551–1561. [CrossRef] [PubMed]
29. Sejdic, E.; Djurovic, I.; Jiang, J. Time–frequency feature representation using energy concentration: An overview of recent
advances. Digit. Signal Process. 2009, 19, 153–183. [CrossRef]
30. Amin, M.F.; Murase, K. Single-layered complex-valued neural network for real-valued classification problems. Neurocomputing
2009, 72, 945–955. [CrossRef]
31. Zimmermann, H.G.; Minin, A.; Kusherbaeva, V.; Germany, M. Comparison of the complex valued and real valued neural
networks trained with gradient descent and random search algorithms. In Proceedings of the of ESANN 2011, Bruges, Belgium,
27–29 April 2011.
32. Lee, Y.K.; Pae, D.S.; Hong, D.K.; Lim, M.T.; Kang, T.K. Emotion Recognition with Short-Period Physiological Signals Using
Bimodal Sparse Autoencoders. Intell. Autom. Soft Comput. 2022, 32, 657–673. [CrossRef]
33. Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161. [CrossRef]
34. Zhang, Q.; Chen, X.; Zhan, Q.; Yang, T.; Xia, S. Respiration-based emotion recognition with deep learning. Comput. Ind. 2017,
92, 84–90. [CrossRef]
35. Topic, A.; Russo, M. Emotion recognition based on EEG feature maps through deep learning network. Eng. Sci. Technol. Int. J.
2021, 24, 1442–1454. [CrossRef]
36. Xu, H.; Plataniotis, K.N. EEG-based affect states classification using deep belief networks. In Proceedings of the IEEE 2016 Digital
Media Industry & Academic Forum (DMIAF), Santorini, Greece, 4–6 July 2016; pp. 148–153.
37. Mert, A.; Akan, A. Emotion recognition from EEG signals by using multivariate empirical mode decomposition. Pattern Anal.
Appl. 2018, 21, 81–89. [CrossRef]
38. Pusarla, N.; Singh, A.; Tripathi, S. Learning DenseNet features from EEG based spectrograms for subject independent emotion
recognition. Biomed. Signal Process. Control. 2022, 74, 103485. [CrossRef]
39. Yun, S.; Jeong, M.; Kim, R.; Kang, J.; Kim, H.J. Graph transformer networks. Adv. Neural Inf. Process. Syst. 2019, 32. Available
online: https://proceedings.neurips.cc/paper_files/paper/2019/file/9d63484abb477c97640154d40595a3bb-Paper.pdf (accessed
on 22 May 2023).
40. Dwivedi, V.P.; Bresson, X. A generalization of transformer networks to graphs. arXiv 2020, arXiv:2012.09699.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.