0% found this document useful (0 votes)
30 views20 pages

Electronics 12 02933 v2

The paper presents a real-time emotion recognition system using photoplethysmography (PPG) and electromyography (EMG) signals, employing a complex-valued neural network to extract common features for improved accuracy. The system consists of three stages: single-pulse extraction, a physiological coherence feature module, and a physiological common feature module, demonstrating superior performance compared to existing methods. This research advances real-time emotion analysis by effectively leveraging physiological signals to understand emotional states without interference.

Uploaded by

emoteplaysyt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views20 pages

Electronics 12 02933 v2

The paper presents a real-time emotion recognition system using photoplethysmography (PPG) and electromyography (EMG) signals, employing a complex-valued neural network to extract common features for improved accuracy. The system consists of three stages: single-pulse extraction, a physiological coherence feature module, and a physiological common feature module, demonstrating superior performance compared to existing methods. This research advances real-time emotion analysis by effectively leveraging physiological signals to understand emotional states without interference.

Uploaded by

emoteplaysyt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

electronics

Article
Physiological Signal-Based Real-Time Emotion Recognition
Based on Exploiting Mutual Information with Physiologically
Common Features
Ean-Gyu Han 1 , Tae-Koo Kang 2, * and Myo-Taeg Lim 1, *

1 School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea; wlrdmlrja@korea.ac.kr
2 Department of Human Intelligence and Robot Engineering, Sangmyung University,
Cheonan 31066, Republic of Korea
* Correspondence: tkkang@smu.ac.kr (T.-K.K.); mlim@korea.ac.kr (M.-T.L.)

Abstract: This paper proposes a real-time emotion recognition system that utilizes photoplethysmog-
raphy (PPG) and electromyography (EMG) physiological signals. The proposed approach employs a
complex-valued neural network to extract common features from the physiological signals, enabling
successful emotion recognition without interference. The system comprises three stages: single-pulse
extraction, a physiological coherence feature module, and a physiological common feature module.
The experimental results demonstrate that the proposed method surpasses alternative approaches
in terms of accuracy and the recognition interval. By extracting common features of the PPG and
EMG signals, this approach achieves effective emotion recognition without mutual interference. The
findings provide a significant advancement in real-time emotion analysis and offer a clear and concise
framework for understanding individuals’ emotional states using physiological signals.

Keywords: emotion recognition; physiological signal; PPG; EMG; multimodal network; convolutional
autoencoder; short-time Fourier transform (STFT); complex-valued convolutional neural network
(CVCNN)

Citation: Han, E.-G.; Kang, T.-K.;


Lim, M.-T. Physiological
Signal-Based Real-Time Emotion
1. Introduction
Recognition Based on Exploiting
Mutual Information with
There is increasing importance in ergonomics, supporting designs based on scientific
Physiologically Common Features.
and engineering analyses of human physical, cognitive, social, and emotional characteris-
Electronics 2023, 12, 2933. https:// tics. Ergonomics encompasses human engineering, biomechanics, cognitive engineering,
doi.org/10.3390/electronics12132933 human–computer interactions (HCI), emotional engineering, and user experiences (UX).
Sophisticated technologies continue to be developed for measurement, experimentation,
Academic Editors: Vladimir Laslo
analysis, design, and evaluation. In particular, HCI has become an important field that has
Tadić and Peter Odry
attracted extensive research, resulting in significant advances and expansion in a variety of
Received: 23 May 2023 fields, including recognizing and using emotions in computers.
Revised: 27 June 2023 Emotion recognition plays an important role in HCI, facilitating interactions between
Accepted: 30 June 2023 humans and intelligent devices (such as computers, smartphones, and the IoT). There are
Published: 3 July 2023 many ways to express emotions, including facial expressions, voice, gestures, text, and
physiological signals, and each method has several advantages [1]. Facial expressions
appear as facial images, and these image data can be acquired easily in various ways.
Voice signals are good to find useful reference information as they are widely used in
Copyright: © 2023 by the authors.
various fields. Gestures can express people’s emotions clearly, and text can be acquired
Licensee MDPI, Basel, Switzerland.
easily through crawling or scraping. However, the characteristics of voice and text are
This article is an open access article
different for each country. Moreover, gestures have limitations because it is difficult to
distributed under the terms and
obtain a dataset, as this requires complex processes, such as recognizing a person’s physical
conditions of the Creative Commons
Attribution (CC BY) license (https://
appearance. Furthermore, with regard to the emotion of facial expressions, voice, gestures,
creativecommons.org/licenses/by/
and text, they can be intentionally controlled, meaning the reliability of a person’s actual
4.0/).
emotions in terms of recognition is low. In contrast, since physiological signals are related

Electronics 2023, 12, 2933. https://doi.org/10.3390/electronics12132933 https://www.mdpi.com/journal/electronics


Electronics 2023, 12, 2933 2 of 20

to the central nervous system, emotions cannot be deliberately controlled. Accordingly,


the reliability of physiological signals when recognizing emotions is guaranteed [2–4].
Therefore, psychological studies focusing on the relationship between physiological signals
(including electroencephalography (EEG), photoplethysmography (PPG), and electromyo-
graphy (EMG) signals) and emotions have been conducted and applied in various fields.
The reason for this research interest is that physiological reactions can reflect dynamic
changes in the central nervous system, which are difficult to hide compared to emotions
expressed through words or facial expressions.
Among the physiological signals, EEG signals are the most commonly used for emotion
recognition [5–7] because they are directly related to the central nervous system and contain
exceptional emotional features. Significant recent research using EEG signals has focused
on extracting EEG features using deep-learning-based methods. Wen et al. [8] proposed a
deep convolutional neural network (CNN) and an autoencoder to extract relevant emotion-
specific features from EEG signals. Alhagry et al. [9] proposed a long short-term memory
(LSTM) approach to classify emotions using EEG signals, and Xing et al. [10] proposed
a framework for emotion recognition using multi-channel EEG signals. Transitions in
emotional states are usually accompanied by changes in the power spectrum of the EEG.
Previous studies have also reported that spectral differences in EEG signals in the anterior
brain region of the alpha band can generally capture different emotional states. Moreover,
different spectral changes between different brain regions are also associated with emotional
responses, such as theta and gamma-band power changes at the right parietal lobe, theta-
band power changes at the frontal midline, and asymmetry of the beta-band power at the
parietal region. Moreover, different spectral changes between different brain regions are
also related to emotional responses. Examples here include changes in theta and gamma
band power in the right parietal lobe, changes in the theta band power in the frontal
midline, and asymmetry in the beta band power in the parietal region.
Soleymani et al. [11] proposed a multimodal dataset termed MAHNOB-HCI for emo-
tion recognition and implicate tagging research. Based on this dataset, they obtained an
EEG spectral output and the valence score of the electrodes and calculated a correlation be-
tween them. They also revealed that higher frequency components on the frontal, parietal,
and occipital lobes had a higher correlation with self-assessment-based valence responses.
Furthermore, they improved the classification performance for continuous emotion recog-
nition by fusing the power spectral density (PSD) and facial features. Koelstra et al. [12]
presented a multimodal dataset for the analysis of human affective states termed DEAP
and extracted the spectral power features of five frequency bands from 32 participants.
In another study, Zheng et al. [13] presented a dataset termed SEED for analyzing stable
patterns across sessions. Lin et al. [14] evaluated features specialized for emotions based
on the power spectral changes of EEG signals and assessed the relationship between EEG
dynamics and music-induced emotional states. They revealed that emotion-specific fea-
tures from the frontal and parietal lobes could provide discriminative information related
to emotion processing. Finally, Chanel et al. [15] employed the naive Bayes classifier to
categorize three arousal-assessment-based emotion classes from specific frequency bands
at particular electrode locations.
Despite the previously mentioned advantages, there are some limitations when using
EEG signals. First, EEG can cause simple partial seizures, or rarely, complex partial
seizures (particularly with frontal onset). This means that interpreting EEG signals can
become difficult or impossible due to body movements that generate excessive artifacts.
Therefore, knowledge of relevant clinical seizures that can accompany EEG changes is
required. Moreover, EEG signals have a high dimensionality, requiring diverse and difficult
processing, rendering subsequent analyses difficult. Finally, signal processing requires
complicated algorithms to analyze brainwave signals, and multiple EEG electrodes must
be attached to subjects to collect reliable brainwave data. For these reasons, it is very
difficult to gather practical EEG data applicable to real life, even if good classification can
be achieved with follow-up analyses. To avoid this limitation, we selected PPG and EMG
Electronics 2023, 12, 2933 3 of 20

signals to recognize emotions rather than EEG, as they contain extensive emotion-specific
information and can be incorporated into wearable devices practically [16–18]. Thus, they
are easily measurable and somewhat less complex to analyze compared to EEG signals.
Therefore, in this study, we paid attention to emotion recognition using a deep learning
model based on PPG and EMG signals.
Psychologists and engineers have attempted to analyze these data to explain and cate-
gorize emotions. Although there are strong relationships between physiological signals and
human emotional states, traditional manual feature extraction suffers from fundamental
limitations to describe emotion-related characteristics from physiological signals.
1. Hand-crafted feature performance largely depends on the signal type and level of
experience. Hence, poor domain knowledge can result in inappropriate features that
are unable to capture some signal characteristics.
2. There is no guarantee that any given feature selection algorithm will extract the
optimal feature set.
3. Moreover, most manual features are statistical and cannot incorporate signal details,
which results in information loss.
In contrast, deep learning can automatically derive features from raw signals, allowing
automatic feature selection and the bypassing of feature selection computational costs,
and is applied to many industrial fields [19,20]. Similarly, deep learning methods have
been recently applied to processing physiological signals (such as EEG or skin resistance),
achieving comparable results with conventional methods. Martinez et al. [21] were the
first to propose CNNs to establish physiological models for emotion, resulting in many
subsequent deep emotion recognition studies. While deep learning has these advantages,
features with conflicting information can disturb the process of recognizing emotions.
Following the above-mentioned research and limitations, the research problem is
described as follows. First, there are many problems in using EEG data in real-time emotion
recognition. Second, traditional manual feature extraction does not guarantee an optimal
feature set, leading to data loss. Finally, if there is a feature with conflicting information,
it can interfere with emotion recognition [22]. Therefore, in this work, we select PPG and
EMG signals and propose a deep learning model that prevents feature interference by
extracting the common features of both signals.
This study is structured as follows. Section 2 describes the overall structure of the
proposed system, including the method of splitting the PPG and EMG signals into a single
pulse. In Section 3, the experimental environment regarding the dataset and experimental
settings and the experimental results are presented, and the performance of the proposed
emotion recognition model is compared with other studies. Finally, Section 4 contains a
summary of the paper and presents the conclusions.

2. Proposed Real-Time Emotion Recognition System


2.1. Overview of the Proposed Real-Time Emotion Recognition System
An overview of the proposed real-time emotion recognition system developed in this
study is presented in Figure 1. To extract the emotional features of a person based on PPG
and EMG signals, a convolutional autoencoder (CAE) and a CNN-based architecture are
combined. Emotional recognition is possible with these features only, but they contain
conflicting information, which can confuse recognition. Therefore, in order to mediate the
confusion, shared emotional features are extracted from the complex-valued convolutional
neural network (CVCNN), in which the inputs are the results of a short-time Fourier trans-
form (STFT). By using the CVCNN, efficient features are acquired from the complex-valued
results of the STFT. Then, those features are concatenated and used to recognize emotions.
Electronics 2023, 12, 2933 4 of 20

Figure 1. Overview of the proposed system.

As shown in Figure 1, the proposed system mainly comprises two modules. The phys-
iological coherence feature module extracts features that exhibit a correlation between the
PPG and EMG signals using a convolutional autoencoder and a two-stream CNN. Further-
more, the physiological common feature module extracts features that share both frequency
information and overall details over time using a short-time Fourier transform (STFT) and
a CVCNN. This module can contribute to successful emotion recognition by preventing
feature interference that may occur in the physiological coherence feature module.

2.2. Single-Pulse Extraction Using Peak-to-Peak Segmentation for PPG and EMG Signals
By including the time or frequency domain and a geometric analysis, there is a variety
of different physiological signal analysis techniques. The most commonly used method is a
time domain analysis, which is divided by the average cycle rate and the difference between
the longest and shortest signal values. However, preprocessing at an average cycle rate is
inefficient because the aim is to capture changing trends immediately. Here, the difference
between the longest and shortest signals is irrelevant, because the data fundamentally differ
between participants. Therefore, the captured signal was split into short periods based
on the peak value to extract the maximum amount of information within the raw signal
while minimizing any losses. These short periods of signals are often directly associated
with underlying physiological properties. Introducing even small variations in these short
periods could potentially distort the underlying properties. As a result, in order to preserve
the integrity of the signals and avoid any potential distortion of the underlying properties,
we chose not to apply any signal augmentation techniques.
Figure 2 indicates that the PPG high peaks and EMG low peaks were clearly distin-
guishable from the characteristic waveforms. However, full-length signals were difficult to
correlate with specific emotions, since emotional expressions weaken or deteriorate with
increasing measurement time. Therefore, we segmented the signals into short signals to
reflect emotion trends and eliminated any signals that differed from this trend. Regular
periodicity signals were divided into single-pulse sections. Comparing the PPG and EMG
data, we set the single-pulse data length to 86 sample points, although segmenting this
length differed depending on the particular signal characteristics.

(
PPGsingle-pulse = [ x ∗H p − c L , x ∗H p + c R ]
Segmentation criteria = (1)
EMGsingle-pulse = [ x ∗Lp − c L , x ∗Lp + c R ]

where x∗ denotes the partial (single-pulse) signal length extracted from the entire signal;
H p and L p are high and low peak locations, respectively; and c L and cR are the left and right
constants, respectively, which were assigned to the relative to the peak points. Figure 3
displays typical resulting extracted signals.
Electronics 2023, 12, 2933 5 of 20

(a) Photoplethysmograph signal (PPG) (b) Electromyograph signal (EMG)

Figure 2. Examples of PPG and EMG signals.

(a) Segmented PPG signals (b) Segmented EMG signals

Figure 3. Results of single-pulse segmentation for PPG and EMG signals.

Using the entire signal does not always help in recognizing emotions. Rather, the
signals typically contain artifact noise which distorts the signal waveform and complicates
the fitting task. Using the entire signal is also rather complicated because we have to
consider the possibility of each emotion starting in an arbitrary time frame [23–25]. In order
to efficiently recognize emotions, it was necessary to determine the appropriate length to
input to the deep learning model after properly segmenting the signal. Therefore, as a result
of exploring the appropriate input signal length through experiments, we found that the
appropriate input signal length is between 10 and 15 pulses. Furthermore, normalization is
essential when processing data that vary from person to person, such as biosignals.
The maximum or minimum value of the signal (the amplitude) is different for each
person. Therefore, to find appropriate peak values, appropriate threshold values must be
determined. For this purpose, a quartile analysis was applied to all peak values.
A quartile analysis is a statistical method used to divide a set of data into four equal
parts (quartiles). The data are sorted in ascending order, and then three equally sized
cut points are selected that divide the data into four groups, with each group containing
an equal number of observations. These cut points are known as quartiles and are often
denoted as Q1, Q2, (median), and Q3. A quartile analysis is useful for understanding the
distribution of a set of data, particularly when the data contain outliers or are not normally
distributed. Moreover, it can provide information on the spread, skewness, and central
tendency of the data.
Using this method, the threshold value that can obtain the maximum information
without losses was 0.15 for PPG and 1.2 for EMG. Figure 4 shows the single-pulse appear-
ance of PPG and EMG when various thresholds (including appropriate threshold values)
are applied.
Electronics 2023, 12, 2933 6 of 20

(a) Effect on single-pulse PPG signals

(b) Effect on single-pulse EMG signals

Figure 4. Effect on single-pulse signals with different thresholds.

2.3. Physiological Coherence Feature Module


2.3.1. Convolutional Autoencoder for 1D Signals
The CAE extends the basic structure of the simple autoencoder by changing the fully
connected layers to convolution layers [26–28]. Identical to the simple autoencoder, the
size of the input layer is also the same as the output layers, although the network of the
encoder changes to convolution layers and the network of the decoder change to transposed
convolutional layers.
As illustrated in Figure 5, an autoencoder consists of two parts: an encoder and a
decoder. The encoder converts the input x to a hidden representation y (feature code) using
a deterministic mapping function. Typically, this is an affine mapping function followed by
a nonlinearity, where W is the weight between the input x and the hidden representation y
and b is the bias.
y = f (Wx + b) (2)
0 0 0
z = f (W y + b ) (3)

The CAE combines the local convolution connection with the autoencoder, which is a
simple step that adds a convolution operation to inputs. Correspondingly, a CAE consists of
a convolutional encoder and a convolutional decoder. The convolutional encoder realizes
the process of convolutional conversion from the input to the feature maps, while the
convolutional decoder implements the convolutional conversion from the feature maps to
the output. In a CAE, the extracted features and the reconstructed output are calculated
through the CNN. Thus, (2) and (3) can be rewritten as follows:

y = ReLU (wx + b) (4)


0 0
z = ReLU (w y + b ) (5)
Electronics 2023, 12, 2933 7 of 20

where, ω represents the convolutional kernel between the input and the code y and ω’
represents the convolutional kernel between the code y and the output. Terms b and b’
are the bias. Moreover, the parameters of the encoding and decoding operations can be
computed using unsupervised greedy training. The proposed architecture of a CAE for 1D
signals is shown in Figure 6.

Figure 5. General structure of an autoencoder in which the encoder and decoder are neural networks.

Figure 6. Architecture of a convolutional autoencoder for 1D signals.

2.3.2. Feature Extraction of Physiological Coherence Features for PPG and EMG Signals
In the previous section, each latent vector of the PPG and EMG signals was extracted
through the CAE. Data compression of the signal was achieved through a dimensionality
reduction, which is the main role of the autoencoder, allowing essential information about
the signals to be extracted. To extract the physiological coherence features through this
latent vector, a feature extraction module was constructed, as depicted in Figure 7. In the
physiological coherence feature module, starting with latent vectors for each PPG and EMG
signal, emotion-related features were extracted using the following process. Moreover, the
features extracted in this way are complementary to PPG and EMG and contain information
about the overall details over time.
First, effective features related to emotions were obtained through a 1D convolutional
layer in each latent vector of the PPG and EMG signals. Complementary features of the PPG
and EMG signals were then extracted by concatenating each feature and passing through
the 1D convolutional layer again. When extracting features from the PPG and EMG signals,
after the first 1D convolutional layer of each process, batch normalization and max pooling
were performed to solve the problem of internal covariate shift and to transfer strong
features with emotional characteristics to the next layer. However, while performing max
pooling and passing strong features with emotional characteristics to the next layer, delicate
representations may be discarded which can capture sophisticated emotional information.
Therefore, only batch normalization was performed at the second convolutional layer. In a
situation where features are fused through concatenation, this could be performed after
Electronics 2023, 12, 2933 8 of 20

arranging features in a row through flattening, as shown in Figure 8a. However, we did
not employ flattening to preserve the overall details over time (also known as temporal
information). Instead, temporal-wise concatenation was performed to ensure that it could
be fused by time steps, as shown in Figure 8b.

Figure 7. Architecture of the physiological coherence feature module.

(a) Feature fusion using flattening (b) Feature fusion without flattening

Figure 8. Effects of flattening on feature fusion.


Electronics 2023, 12, 2933 9 of 20

2.4. Physiological Common Feature Module


Basically, there are two main methods of extracting features from physiological signals
in emotion recognition. One is the statistical feature extraction method, which extracts
statistical features based on statistical facts. The other is deep-learning-based feature
extraction, which extracts features through a deep learning model.
Statistical features (also known as hand-crafted features) are less reliable because
people judge necessary features. People judge the statistical features related to the task
that they want to perform and select them, although it is unclear whether the features are
actually related to the task that they want to perform. Therefore, the deep-learning-based
feature extraction method is currently being used extensively. Although it has the advantage
of being able to extract features related to emotion recognition by automatically extracting
extensive amounts of important information from the signal, it has the disadvantage of not
being able to obtain information about the frequency band.
In order to compensate for the disadvantages of each method, the results of the STFT
were applied to a deep learning model to extract features that also included the information
of the frequency band in addition to information about overall details over time.

2.4.1. Signal Conversion from Time Domain to Time-Frequency Domain Using the
Short-Time Fourier Transform
The STFT is a Fourier-related transform used to determine the sinusoidal frequency
and phase content of local sections of a signal as it changes over time [29]. Although
the fast Fourier transform (FFT) can clearly indicate the frequency of a signal, it has the
disadvantage of having difficulty determining how much the frequency has changed over
time. In contrast, the STFT can easily determine the frequency over time because it performs
a Fourier transform by dividing the section over time.
The process of the STFT is depicted in Figure 9. Here, the hop length is the length that
the window jumps from the current section to the next section, and the overlap length is
the overlapping length between the current window and the next window. The resulting
value of the STFT comprises a complex number that contains information on both the
magnitude and phase. Therefore, the use of complex numbers is inevitable when using
both the magnitude and phase information of the STFT as shown in Figure 10.

Figure 9. Process of the STFT.

2.4.2. Complex-Valued Convolutional Neural Network (CVCNN)


In general neural networks, neurons have weights, inputs, and outputs in the real
domain, and these neural networks are called real-valued neural networks (RVNNs).
Moreover, each neuron constituting an RVNN is called a real-valued neuron (RVN). The
complex number resulting value of the STFT mentioned in Section 2.4.1 cannot be treated
with RVNs. Therefore, to treat the complex number value from deep learning, the complex-
valued neuron (CVN) and complex-valued neural network (CVNN) are necessary.
Electronics 2023, 12, 2933 10 of 20

Figure 10. STFT process for the physiological common feature module.

The CVN has the same structure as an RVN, as depicted in Figure 11. However, the
weights, inputs, and outputs of CVNs all exist in the complex domain. Therefore, they can
be applied to various fields that use a complex system [30,31].

(a) Real-valued neuron (RVN) (b) Complex-valued neuron (CVN)

Figure 11. Structure of an RVN and a CVN.

A real-valued convolution operation takes a matrix and a kernel (a smaller matrix)


and outputs a matrix. The matrix elements are computed using a sliding window with the
same dimensions as the kernel and each element is the sum of the point-wise multiplication
of the kernel and matrix patch at the corresponding window.
Herein, we use the dot product to represent the sum of a point-wise multiplication
between two matrices

X·A= ∑ Xij Aij (6)


ij

In the complex generalization, both the kernel and input patch are complex values. The
only difference stems from the nature of multiplying complex numbers. When convolving
a complex matrix with the kernel W = A + iB, the output corresponding to the input patch
Z = X + iY is given by

Z · W = ( X · A − Y · B) + i( X · B + Y · A) (7)

To implement the same functionality with a real-valued convolution, the input and
output should be equivalent. Each complex matrix is represented by two real matrices
stacked together in a three-dimensional array. Denoting this array [X, Y], it is equivalent to
X + iY. X and Y are the array’s channels (Figure 12).

2.4.3. Feature Extraction of Physiological Common Features for PPG and EMG Signals
As mentioned at the beginning of this chapter, there are shortcomings in the method
of extracting each feature. Therefore, in this section, to address these shortcomings, features
were extracted while preserving the general details over time of the signals and the infor-
Electronics 2023, 12, 2933 11 of 20

mation of the frequency band by applying the STFT and CVNN, as explained previously.
Figure 13 shows the structure of the proposed physiological common feature module.

Figure 12. Process of complex-valued convolution.

As shown in Figure 13, the common features of the two biosignals (PPG and EMG)
were extracted in this study, because extracting the features of PPG and EMG separately is
an inherently inefficient method. In other words, selecting individual features provides too
much input data in single-task learning and creates the possibility that each feature would
adversely affect other features and interfere with the task to be performed. In addition, the
resultant value of the STFT is composed of complex numbers and includes information
on both magnitude and phase. Therefore, to use both the intensity and phase information
of the STFT, the use of complex numbers is inevitable. Accordingly, a CVNN was used to
extract the features.

Figure 13. Structure of the physiological common feature module.

For the following reasons, we propose to extract the common features of the PPG
and EMG signals through a CVNN instead of extracting the features for each PPG and
EMG signal.
Therefore, the total structure of our proposed real-time emotion recognition system
can be represented as shown in Figure 14.
Electronics 2023, 12, 2933 12 of 20

Figure 14. Total structure of the proposed emotion recognition system.

3. Experimental Results
3.1. Datasets
It is important to decide which dataset to use for a study since the type and characteris-
tics of a dataset have a significant influence on the results. In particular, datasets containing
only physiological signals (not image-generated datasets) are required for emotion recogni-
tion through physiological signals. We required a dataset containing PPG and EMG signals;
thus, among the available datasets, we chose the DEAP dataset [12]. Moreover, we created
a dataset, EDPE, for more granular emotions (as used in a previous study [32]).
Emotions can be affected by many factors, and each emotion has fuzzy boundaries.
Therefore, it is ambiguous to quantify emotions or define them using objective criteria. Var-
ious models that define emotion have been developed, although most emotion recognition
studies use Russell’s circumplex theory [33], which assumes emotions are distributed in a
two-dimensional circular space with arousal and valence dimensions. Generally, arousal
is considered as the vertical axis and valence the horizontal, with the origin (circle center)
representing neutral valence and medium arousal level.
As shown in Figure 15, emotional states can be represented at any valence and arousal
level. For example, “Excited” has high arousal and high valence, whereas “Depressed” has
low arousal and low valence. Emotions can manifest in various ways, and current emotion
recognition systems are generally based on facial expressions, voice, gestures, and text.

3.1.1. Database for Emotion Analysis Using Physiological Signals (DEAP)


The DEAP dataset contains 32-channel EEG and peripheral physiological signals
(including PPG and EMG). Furthermore, these signals were measured from a total of
32 participants (16 male and 16 female) who watched 40 music videos and self-assessed on
five criteria (including arousal and valence). Each participant’s age was within the range of
19–37 years (average of 26.9 years), and self-evaluation was on a continuous scale from 1 to
Electronics 2023, 12, 2933 13 of 20

9, except for familiarity (which was a discrete scale from 1 to 5). Thirty-two participants
first put on a device that can collect the signals and started the device three seconds before
watching the video to measure the signals when they were in a calm state. After that, they
watched the videos and started the self-assessment after the video was finished. This step
was repeated to collect the signals. The signals were measured at 512 Hz and also the data
were downsampled to 128 Hz. Furthermore, the dataset is summarized in Table 1.

Figure 15. Russell’s circumplex model [33].

Table 1. DEAP dataset summary.

DEAP Dataset Experiment


Participants 32 (male: 16, female: 16)
Videos 40 music videos
Age Between 19 and 37
Arousal, Valence, Dominance, Liking,
Rating categories
Familiarity

Rating values Familiarity: discrete scale of 1–5


Others: continuous scale of 1–9
32-channel EEG
Recorded signals
Peripheral physiological signals
Face video (only for 22 participants)
Sampling rate 512 Hz (or downsampled to 128 Hz)

3.1.2. Emotion Dataset Using PPG and EMG Signals (EDPE)


The EDPE dataset has a total of 40 participants (30 males and 10 females) who watched
32 videos that evoked specific emotions and then self-evaluated their arousal and valence.
Each video lasted 3–5 min and the total duration of the experiment was 2.5–3.0 h. Each
participant’s age was within the range of 20–28 years and the self-assessment proceeded
with a four discrete step evaluation of −2, −1, +1, +2 in Arousal and Valence. Through
Electronics 2023, 12, 2933 14 of 20

these four-step self-assessments, emotions are classified into 16 areas expressed in Figure 16,
not four areas. This makes it more efficient to recognize emotions at the level of emotions
defined by adjectives. The overall experimental process is as follows. First, participants
attach a sensor and wait in a normal state for 10 min without measuring the signals. After
that, they watch videos corresponding to the four quadrants of Russell’s model while the
signals are measured. After each videos finishes, they start a self-assessment. The measured
signals are PPG and EMG, which are sampled and measured at 100 Hz, as summarized in
Table 2.

Figure 16. Proposed emotion plane (valence–arousal plane).

Table 2. EDPE dataset summary.

EDPE Dataset Experiment


Participants 40 (male: 30, female: 10)
Videos 32 videos
Age Between 20 and 28
Rating categories Arousal, Valence
Rating values Discrete scale of −2, −1, +1, +2
Proposed emotion states 16 emotions depicted in Figure 16
Recorded signals PPG, EMG
Sampling rate 100 Hz

3.2. Experimental Setup


The experiment was conducted by setting the single-pulse lengths to 86 and 140 data
points for the DEAP and EDPE datasets, respectively. Each sample was preprocessed
through MATLAB (R2020b), and learning was conducted using Tensorflow (2.6.0) and
Keras (2.6.0). In the fields of cognitive engineering, HCI (emotion engineering and medicine)
and BCI (perception of an individual’s emotional state) are very important. Hence, the
experiment in this study was not conducted in a subject-independent way. Therefore, 80%
randomly-selected samples from the DEAP dataset were used for training and 20% for
testing. Similarly, a randomly selected 80% of samples from the entire EDPE dataset was
used for training and 20% of the samples was used for testing.
In addition, the accuracy of each pulse number was measured to confirm how many
pulses from each dataset were suitable for recognizing emotions. Based on the appropriate
number of pulses (confirmed from the experiment), the performances of the algorithm
proposed in this study and other algorithms were compared. Finally, by learning each
of the 16 emotions classified by arousal and valence, an experiment was conducted to
determine how well they could be measured in the emotion class.
Electronics 2023, 12, 2933 15 of 20

3.3. Classification Results on Deap Dataset


Figure 17 shows the average accuracies of valence and arousal according to the number
of pulses in the DEAP dataset. When the DEAP dataset contained only a single pulse, the
accuracy was very low (47%). However, as the number of pulses increased from 1 to 10, the
accuracy increased rapidly, although, beyond 10 pulses, the accuracy remained the same.
In regard to the DEAP dataset, although the accuracy increased rapidly up to 10 pulses,
the ideal number of pulses in the DEAP dataset was set to 15, because this produced the
optimum performance (75%).

Figure 17. Classification accuracy of DEAP dataset according to pulse length.

Table 3 shows a comparison of the results of the emotion recognition using the DEAP
dataset. The proposed method exhibited an accuracy of 75.76% and 74.32% in arousal and
valence. Except for studies [10,34,35], it can be seen that the proposed method shows the
best performance and is also superior in terms of the recognition interval. Compared to the
proposed method, the study [34] performs poorly in valence, but outperforms in arousal.
Conversely, study [10] performs well in valence, but does not perform well in arousal. In
the case of study [35], it can be seen that both valence and arousal perform better than the
proposed method. However, it is difficult to make an appropriate comparison because the
three studies have a longer recognition interval than the proposed method.

Table 3. Comparison with other studies using the DEAP dataset.

Accuracy
Method Recognition Interval Signals
Arousal Valence
Naïve Bayes with
GSR, RSP, SKT,
Statistical Features 63 s 57% 62.7%
PPG, EMG, EOG
(Koelstra, 2011) [12]
CNN
30 s BVP, SC 69.1% 63.3%
(Martinez, 2013) [21]
DBN
60 s EEG 69.8% 66.9%
(Xu, 2016) [36]
Deep Sparse AE
20 s RSP 80.78% 73.06%
(Zhang, 2017) [34]
MEMD
60 s EEG 75% 72.87%
(Mert, 2018) [37]
SAE-LSTM
60 s EEG 74.38% 81.1%
(Xing, 2019) [10]
HOLO-FM
60 s EEG 77.72% 76.61%
(Topic, 2021) [35]
Proposed Method 15 s PPG, EMG 75.76% 74.32%
Electronics 2023, 12, 2933 16 of 20

Therefore, as shown in Table 4, the performance was compared again by matching


the recognition interval to the same 15 s as the proposed method. Study [10] used LSTM
to construct the model according to relatively long-term signals; thus, it seems that the
performance has decreased significantly compared to studies [34,35]. Therefore, when
comparing the proposed method and research [10,34,35] under the same conditions, the
proposed method shows the best performance.

Table 4. Re-comparison with the top-3 studies in Table 3 (with the recognition interval set to 15 s).

Accuracy
Method Recognition Interval Signals
Arousal Valence
Deep Sparse AE
15 s RSP 69.8% 70.67%
(Zhang, 2017) [34]
SAE-LSTM
15 s EEG 54.46% 50.98%
(Xing, 2019) [10]
HOLO-FM
15 s EEG 70.54% 72.32%
(Topic, 2021) [35]
Proposed Method 15 s PPG, EMG 75.76% 74.32%

3.4. Classification Results on the EDPE Dataset


Figure 18 shows the average accuracies of valence and arousal according to the number
of pulses in the EDPE dataset. When the EDPE dataset contained only a single pulse, the
accuracy was very low (46%). However, as the number of pulses increased from 1 to 10,
the accuracy increased rapidly, although, beyond that number, the accuracy decreased.
Therefore, for the EDPE dataset, the performance was optimum (85%) when it contained 10
pulses. Accordingly, the ideal number of pulses in the EDPE dataset was set to 10.

Figure 18. Classification accuracy of the EDPE dataset according to pulse length.

Figure 19 shows the confusion matrix of the arousal and valence results when the ideal
number of pulses in the EDPE dataset was set to 10. Figure 19 exhibits that many num-
bers were on the descending right diagonal where the predictions and answers matched,
indicating that the learning was successful. Items with relatively high numbers (except
for numbers on the diagonal) were <Very High–High> and <Very Low–Low>, and Very
High became High, High became Very High, or Very Low became Low and Low (which
refers to the case of confusion with Very Low). Even though there were cases of confusion,
the overwhelming majority of correctly predicted cases proved the excellent classification
performance of the method proposed in this study.
Experiments were also conducted with various deep learning models based on a
CNN and LSTM (commonly used in deep learning models) using the same EDPE dataset.
Although CNNs are one of the most-used deep neural networks for analyzing visual images,
they have frequently been employed in recent emotion recognition research by analyzing
patterns of adjacent physiological signals. Therefore, we compared the performance of
CNNs and models that combined a stacked autoencoder and a CNN or LSTM. Finally,
Electronics 2023, 12, 2933 17 of 20

the performance of the bimodal stacked sparse autoencoder [32] was compared. Table 5
summarizes the experimental results of emotion recognition.

(a) Confusion matrix—arousal (b) Confusion matrix—valence

Figure 19. Confusion Matrix of classification result—EDPE dataset.

As shown in Table 5, the performance was low when recognizing emotions using
LSTM. This result indicated that the data were not just time dependent but also more
complex. Therefore, this suggested that improved results could be obtained by analyzing
data patterns using a fully connected layer and a convolutional layer. As a result, our
proposed model outperformed the other deep learning models.

Table 5. Comparison with various deep learning models.

Accuracy
Model Dataset Recognition Interval
Arousal Valence
CNN 70.24% 74.34%
Stacked Auto-encoder + CNN 71.47% 72.01%
EDPE
Stacked Auto-encoder + LSTM 10 s 61.03% 59.25%
dataset
Bimodal-Stacked Auto-encoder [32] 75.86% 80.18%
Proposed Model 84.84% 86.50%

Recognizing the highs and lows of arousal and valence has a very different meaning
from recognizing emotion itself. Being able to recognize arousal well does not necessarily
mean that valence can also be recognized well, and vice versa. In other words, recognizing
emotions is a more complicated and difficult problem than recognizing high and low
levels of arousal or valence, in which both arousal and valence standards are applied
simultaneously, as shown in Figure 16. Therefore, to recognize emotions, we reconstructed
the EDPE dataset with data and labels for each of the 16 emotions in Figure 16, and
training and testing were conducted by dividing the sample into 80% for training and 20%
for testing.
Table 6 presents the recognition results for the 16 emotions, which displayed an
average recognition accuracy of 82.52%. Although this result was slightly lower than the
recognition accuracy for arousal and valence, it was sufficiently accurate to be applied
successfully in real-life scenarios, considering the difficulty of recognizing 16 emotions
compared to each recognition task for arousal and valence.

Table 6. Results of emotion recognition for sixteen emotions by the proposed model.

Quadrant Emotions
Astonished Convinced Excited Delighted
Quadrant I
(HVHA) 85.37% 87.09% 81.34% 80.20%
Electronics 2023, 12, 2933 18 of 20

Table 6. Cont.

Quadrant Emotions
Distress Disgust Annoyed Impatient
Quadrant II
(LVHA) 78.35% 80.97% 75.26% 82.97%
Sad Anxious Worried Bored
Quadrant III
(LVLA) 79.61% 82.89% 77.49% 90.04%
Confident Serious Pleased Calm
Quadrant IV
(HVLA) 85.08% 83.24% 86.87% 83.40%
HVHA: High Valence High Arousal, LVLA: Low Valence Low Arousal, LVHA: Low Valence High Arousal, HVLA:
High Valence Low Arousal.

4. Conclusions
This paper proposed a novel approach for real-time emotion recognition using physio-
logical signals (PPG and EMG) through the extraction of physiologically common features
via a CVCNN. The results indicated that the proposed approach achieved an accuracy of
81.78%, which is competitive with existing methods. Furthermore, we confirmed that the
recognition interval was significantly shorter than in other studies, rendering the proposed
method suitable for real-time emotion recognition.
The findings of this study suggest that the proposed approach has the potential
to be applied in various fields, such as healthcare, human–computer interactions, and
affective computing. Moreover, this study provides insights into the relationship between
physiological signals and emotions, which can further advance our understanding of the
human affective system.
While the proposed approach shows promise in real-time emotion recognition using
physiological signals, there are some limitations. Firstly, the concept of cross-subject
analysis, which involves analyzing data from multiple subjects, is not incorporated in this
study. This limits the generalizability of the findings to a broader population. Next, the
experiments were conducted in a controlled laboratory setting, which may not fully capture
the range of emotions experienced in real-life situations. Therefore, there is a need for
future research to address these limitations.
In light of these limitations, future research should consider conducting experiments
in wild environments to better understand the applicability of the proposed approach in
real-world scenarios. This would provide a more comprehensive understanding of how
emotions manifest in different contexts. In addition, by understanding the properties of
the matrix, which is the result of a STFT, it is possible to derive novel approaches such as
spectrograms [38] or graph transformer models [39,40]. Furthermore, it is important to
expand the scope of the investigation beyond short-term emotion recognition. Long-term
emotion recognition should be explored to gain insights into how emotions evolve and
fluctuate over extended periods of time.
Moreover, future research could focus on defining and recognizing personality traits
based on changes in emotions. By studying the relationship between emotions and person-
ality, we can gain a deeper understanding of the human affective system. This would not
only contribute to the field of affective computing, but also have practical implications in
various domains, such as healthcare and human–computer interactions.
In summary, by addressing the limitations related to cross-subject analysis and con-
ducting experiments in real-life settings, future research can enhance the applicability and
generalizability of the proposed approach. Additionally, exploring long-term emotion
recognition and its connection to personality traits would provide valuable insights into
the complex nature of human emotions.
Electronics 2023, 12, 2933 19 of 20

Author Contributions: Conceptualization, E.-G.H., T.-K.K. and M.-T.L.; data curation, E.-G.H.;
formal analysis, E.-G.H., T.-K.K. and M.-T.L.; methodology, E.-G.H., T.-K.K. and M.-T.L.; software,
E.-G.H. and T.-K.K.; validation, M.-T.L. and T.-K.K.; writing—original draft, E.-G.H. and T.-K.K.;
writing—review and editing, M.-T.L. and T.-K.K. All authors have read and agreed to the published
version of the manuscript.
Funding: This research was supported by the Basic Science Research Program through the National
Research Foundation of Korea (NRF) (grant no. NRF-2022R1F1A1073543).
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Ali, M.; Mosa, A.H.; Al Machot, F.; Kyamakya, K. Emotion recognition involving physiological and speech signals: A comprehensive
review. In Recent Advances in Nonlinear Dynamics and Synchronization; Springer: Berlin/Heidelberg, Germany, 2018; pp. 287–302.
2. Sim, H.; Lee, W.H.; Kim, J.Y. A Study on Emotion Classification utilizing Bio-Signal (PPG, GSR, RESP). Adv. Sci. Technol. Lett.
2015, 87, 73–77.
3. Chen, J.; Hu, B.; Moore, P.; Zhang, X.; Ma, X. Electroencephalogram-based emotion assessment system using ontology and data
mining techniques. Appl. Soft Comput. 2015, 30, 663–674. [CrossRef]
4. Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A review of emotion recognition using physiological signals.
Sensors 2018, 18, 2074. [CrossRef] [PubMed]
5. Houssein, E.H.; Hammad, A.; Ali, A.A. Human emotion recognition from EEG-based brain–computer interface using machine
learning: A comprehensive review. Neural Comput. Appl. 2022, 34, 12527–12557. [CrossRef]
6. Al-Qazzaz, N.K.; Alyasseri, Z.A.A.; Abdulkareem, K.H.; Ali, N.S.; Al-Mhiqani, M.N.; Guger, C. EEG feature fusion for motor
imagery: A new robust framework towards stroke patients rehabilitation. Comput. Biol. Med. 2021, 137, 104799. [CrossRef]
[PubMed]
7. Sung, W.T.; Chen, J.H.; Chang, K.W. Study on a real-time BEAM system for diagnosis assistance based on a system on chips
design. Sensors 2013, 13, 6552–6577. [CrossRef]
8. Wen, T.; Zhang, Z. Deep convolution neural network and autoencoders-based unsupervised feature learning of EEG signals.
IEEE Access 2018, 6, 25399–25410. [CrossRef]
9. Alhagry, S.; Fahmy, A.A.; El-Khoribi, R.A. Emotion recognition based on EEG using LSTM recurrent neural network. Int. J. Adv.
Comput. Sci. Appl. 2017, 8, 355–358. [CrossRef]
10. Xing, X.; Li, Z.; Xu, T.; Shu, L.; Hu, B.; Xu, X. SAE + LSTM: A New framework for emotion recognition from multi-channel EEG.
Front. Neurorobot. 2019, 13, 37. [CrossRef]
11. Soleymani, M.; Lichtenauer, J.; Pun, T.; Pantic, M. A multimodal database for affect recognition and implicit tagging. IEEE Trans.
Affect. Comput. 2011, 3, 42–55. [CrossRef]
12. Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for
emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [CrossRef]
13. Zheng, W.L.; Zhu, J.Y.; Lu, B.L. Identifying stable patterns over time for emotion recognition from EEG. IEEE Trans. Affect.
Comput. 2017, 10, 417–429. [CrossRef]
14. Lin, Y.P.; Wang, C.H.; Jung, T.P.; Wu, T.L.; Jeng, S.K.; Duann, J.R.; Chen, J.H. EEG-based emotion recognition in music listening.
IEEE Trans. Biomed. Eng. 2010, 57, 1798–1806. [PubMed]
15. Chanel, G.; Kronegg, J.; Grandjean, D.; Pun, T. Emotion assessment: Arousal evaluation using EEG’s and peripheral physiological
signals. In Proceedings of the Multimedia Content Representation, Classification and Security: International Workshop, MRCS
2006, Istanbul, Turkey, 11–13 September 2006; Proceedings; Springer: Berlin/Heidelberg, Germany, 2006; pp. 530–537.
16. Udovičić, G.; Ðerek, J.; Russo, M.; Sikora, M. Wearable emotion recognition system based on GSR and PPG signals. In Proceedings
of the 2nd International Workshop on Multimedia for Personal Health and Health Care, Mountain View, CA, USA, 23 October
2017; pp. 53–59.
17. Li, C.; Xu, C.; Feng, Z. Analysis of physiological for emotion recognition with the IRS model. Neurocomputing 2016, 178, 103–111.
[CrossRef]
18. Lee, Y.K.; Kwon, O.W.; Shin, H.S.; Jo, J.; Lee, Y. Noise reduction of PPG signals using a particle filter for robust emotion recognition.
In Proceedings of the 2011 IEEE International Conference on Consumer Electronics—Berlin (ICCE—Berlin), Berlin, Germany, 3–6
September 2011; pp. 202–205.
19. Noroznia, H.; Gandomkar, M.; Nikoukar, J.; Aranizadeh, A.; Mirmozaffari, M. A Novel Pipeline Age Evaluation: Considering
Overall Condition Index and Neural Network Based on Measured Data. Mach. Learn. Knowl. Extr. 2023, 5, 252–268. [CrossRef]
20. Mirmozaffari, M.; Yazdani, M.; Boskabadi, A.; Ahady Dolatsara, H.; Kabirifar, K.; Amiri Golilarz, N. A novel machine learning
approach combined with optimization models for eco-efficiency evaluation. Appl. Sci. 2020, 10, 5210. [CrossRef]
21. Martinez, H.P.; Bengio, Y.; Yannakakis, G.N. Learning deep physiological models of affect. IEEE Comput. Intell. Mag. 2013,
8, 20–33. [CrossRef]
Electronics 2023, 12, 2933 20 of 20

22. Ozbulak, U.; Gasparyan, M.; Rao, S.; De Neve, W.; Van Messem, A. Exact Feature Collisions in Neural Networks. arXiv 2022,
arXiv:2205.15763.
23. Wu, C.K.; Chung, P.C.; Wang, C.J. Representative segment-based emotion analysis and classification with automatic respiration
signal segmentation. IEEE Trans. Affect. Comput. 2012, 3, 482–495. [CrossRef]
24. Picard, R.W.; Vyzas, E.; Healey, J. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Trans.
Pattern Anal. Mach. Intell. 2001, 23, 1175–1191. [CrossRef]
25. Zeng, Z.; Pantic, M.; Roisman, G.I.; Huang, T.S. A survey of affect recognition methods: Audio, visual and spontaneous expressions.
In Proceedings of the 9th International Conference on Multimodal Interfaces, Aichi, Japan, 12–15 April 2007; pp. 126–133.
26. Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In
Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2011: 21st International Conference on Artificial
Neural Networks, Espoo, Finland, 14–17 June 2011; Proceedings, Part I 21; Springer: Berlin/Heidelberg, Germany, 2011; pp.
52–59.
27. Wang, Y.; Xie, Z.; Xu, K.; Dou, Y.; Lei, Y. An efficient and effective convolutional auto-encoder extreme learning machine network
for 3d feature learning. Neurocomputing 2016, 174, 988–998. [CrossRef]
28. Huang, H.; Hu, X.; Zhao, Y.; Makkie, M.; Dong, Q.; Zhao, S.; Guo, L.; Liu, T. Modeling task fMRI data via deep convolutional
autoencoder. IEEE Trans. Med. Imaging 2017, 37, 1551–1561. [CrossRef] [PubMed]
29. Sejdic, E.; Djurovic, I.; Jiang, J. Time–frequency feature representation using energy concentration: An overview of recent
advances. Digit. Signal Process. 2009, 19, 153–183. [CrossRef]
30. Amin, M.F.; Murase, K. Single-layered complex-valued neural network for real-valued classification problems. Neurocomputing
2009, 72, 945–955. [CrossRef]
31. Zimmermann, H.G.; Minin, A.; Kusherbaeva, V.; Germany, M. Comparison of the complex valued and real valued neural
networks trained with gradient descent and random search algorithms. In Proceedings of the of ESANN 2011, Bruges, Belgium,
27–29 April 2011.
32. Lee, Y.K.; Pae, D.S.; Hong, D.K.; Lim, M.T.; Kang, T.K. Emotion Recognition with Short-Period Physiological Signals Using
Bimodal Sparse Autoencoders. Intell. Autom. Soft Comput. 2022, 32, 657–673. [CrossRef]
33. Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161. [CrossRef]
34. Zhang, Q.; Chen, X.; Zhan, Q.; Yang, T.; Xia, S. Respiration-based emotion recognition with deep learning. Comput. Ind. 2017,
92, 84–90. [CrossRef]
35. Topic, A.; Russo, M. Emotion recognition based on EEG feature maps through deep learning network. Eng. Sci. Technol. Int. J.
2021, 24, 1442–1454. [CrossRef]
36. Xu, H.; Plataniotis, K.N. EEG-based affect states classification using deep belief networks. In Proceedings of the IEEE 2016 Digital
Media Industry & Academic Forum (DMIAF), Santorini, Greece, 4–6 July 2016; pp. 148–153.
37. Mert, A.; Akan, A. Emotion recognition from EEG signals by using multivariate empirical mode decomposition. Pattern Anal.
Appl. 2018, 21, 81–89. [CrossRef]
38. Pusarla, N.; Singh, A.; Tripathi, S. Learning DenseNet features from EEG based spectrograms for subject independent emotion
recognition. Biomed. Signal Process. Control. 2022, 74, 103485. [CrossRef]
39. Yun, S.; Jeong, M.; Kim, R.; Kang, J.; Kim, H.J. Graph transformer networks. Adv. Neural Inf. Process. Syst. 2019, 32. Available
online: https://proceedings.neurips.cc/paper_files/paper/2019/file/9d63484abb477c97640154d40595a3bb-Paper.pdf (accessed
on 22 May 2023).
40. Dwivedi, V.P.; Bresson, X. A generalization of transformer networks to graphs. arXiv 2020, arXiv:2012.09699.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy