0% found this document useful (0 votes)
10 views4 pages

Real Time Emotion Recognition From Facial Expressions Using CNN Architecture

Uploaded by

meryyyyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Real Time Emotion Recognition From Facial Expressions Using CNN Architecture

Uploaded by

meryyyyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Real Time Emotion Recognition from Facial

Expressions Using CNN Architecture

Mehmet Akif OZDEMIR1,Berkay ELAGOZ1, AysegulALAYBEYOGLU2, Reza SADIGHZADEH3and Aydin AKAN1


1
Department of Biomedical Engineering, 2Department of Computer Engineering, 3Business Administration
Izmir Katip Celebi University
Izmir, Turkey
makif.ozdemir@ikc.edu.tr, berkayelagoz@gmail.com, aysegul.alaybeyoglu@ikc.edu.tr, riza@taimaksan.com, aydin.akan@ikc.edu.tr

Abstract—Emotion is an important topic in different fields hidden layers. These artificial neurons take input from image,
such as biomedical engineering, psychology, neuroscience and multiply weight, add bias and then apply activation function.
health. Emotion recognition could be useful for diagnosis of brain So that, artificial neurons can be used in image classification,
and psychological disorders. In recent years, deep learning has recognition, and segmentation by perform simple
progressed much in the field of image classification. In this study, convolutions. By feeding the convolutional neural network
we proposed a Convolutional Neural Network (CNN) based with more data (huge amount of data), a better and highly
LeNet architecture for facial expression recognition. First of all, accurate deep learning model can be achieved.
we merged 3 datasets (JAFFE, KDEF and our custom dataset).
Then we trained our LeNet architecture for emotion states Deep learning based facial expression recognition is one of
classification. In this study, we achieved accuracy of 96.43% and these methods to detect emotion state (e.g., anger, fear,
validation accuracy of 91.81% for classification of 7 different neutral, happiness, disgust, sadness and surprise) of human.
emotions through facial expressions. This method aims to detect facial expressions automatically to
identify emotional state with high accuracy. In this method,
Keywords—Convolutional Neural Network; Deep Learning; labeled facial images from facial expression dataset are sent to
Emotion Recognition; Facial Expressions, Real Time Detection. CNN and CNN is trained by these images. Then, proposed
CNN model makes a determination which facial expression is
I. INTRODUCTION performed.
Although there are many studies in the literature on
emotion, there is no common or singular definition in the Chang et al. used CNN model based on ResNet to extract
literature about emotion[1]. Emotion is the appearance or feature from Fer2013 and CK+ dataset. Proposed complexity
reflection of a feeling. Distinct from feeling, emotion can be perception classification algorithm (CPC) was applied with
either real or sham. For example, feeling of pain can directly different classifiers (Softmax, LinearSVM, and
represent the feeling. But emotions are not felt exactly. RandomForest). CNN+Softmax with CPC has achieved
Emotions present inner situations psychologically [2, 3]. 71.35% and 98.78% recognition accuracies for Fer2013 and
CK+ respectively [12].
Emotion is an important, complex and extensive research
topic in the fields of biomedical engineering [4], psychology Clawson et al. proposed two human centric CNN
[5], neuroscience [6] and health [7]. Emotion detection is an architecture for facial expression recognition on CK+ dataset.
important research area in biomedical engineering. Studies in CNN A consists of 1 convolutional layer and 1 max pooling
this area focus on predicting human emotion and computer- layer. CNN B consists of 2 convolutional layers and 2 max
assisted diagnosis of psychological disorders. There are pooling layers. These architectures trained with 0.0001 initial
different methods in literature to detect emotional states such learning rate, 300 epochs and 10 batch size. According to
as electroencephalography (EEG), galvanic skin response results, proposed model has achieved 93.3% accuracy on CK+
(GSR), speech analysis, facial expression, multimodal, visual images [13].
scanning behavior [8-10]. Nguyen et al. proposed multi-level18-layer CNN model
In recent years, with the popularization of deep learning, similar to VGG. These model does not take only high-level
great progress has been made in image classification. features also takes mid-level features. Plain CNN model has
Convolutional neural networks (CNNs) is an artificial neural reached 69.21% accuracy and proposed multi-level CNN
network type that proposed by Yann LeChun in 1988 [11]. model has reached 73.03% accuracy on Fer2013 dataset [14].
Convolutional neural networks are one of the most popular Cao et al. proposed CNN model with K-means clustering
deep learning architectures for image classification, idea and SVM classifier which has achieved 80.29% accuracy.
recognition, and segmentation. K-means clustering model determines initial value of the
Convolutional neural networks built like a human brain convolution kernel of CNN. SVM layers takes features from
with artificial neurons and consist of hierarchical multiply trained CNN model to classify Fer2013 images [15].

978-1-7281-2420-9/19/$31.00 ©2019 IEEE


Ahmed et al. merged different facial expression datasets
which are CK, CK+, Fer2013, the MUG facial expression
database, KDEF, AKDEF, and KinFaceW-I/II. Data
augmentation was applied merged dataset. Proposed CNN
model consists of 3 convolutional layers with 32, 64, 128
filters and kernel size are 3x3. According to results, proposed
model has reached 96.24% accuracy [16].
Christou et al. proposed 13 layer CNN model that used on
Fer2013 dataset and achieved 91.12% accuracy on validation Fig.2. Example of images from KDEF facial expression dataset.
dataset [17].
Our custom dataset contains 140 images with 7 facial
Sajjanhar et al. worked on CK+, JAFFE and FACES expressions (happy, sad, surprised, angry, disgust, afraid and
datasets. They trained and used pre-trained CNN models such neutral). Participants are 1 male and 1 female. Each facial
as Inception-V3, VGG-16, VGG-19 and VGG-Face. expression was expressed 10 times by 1 participant.
According to results, highest accuracy (97.16%) was obtained
with VGG-19 model on FACES dataset [18]. B. Image Preprocessing
Chen et al. proposed two-stage framework based on Containing approximately equal numbers of face images
Difference Convolutional Neural Network (DCNN) that which is seven different facial expressions were different
trained with CK+ and BU-4DFE datasets. Results showed resolutions, because of there were 3 different databases.
proposed model achieved 95.4% accuracy on CK+ dataset and Therefore, first of all, the face circumference was detected
77.4% on BU-4DFE [19]. using the Haar Cascade library from the pictures. Then, these
detected rectangular facial expressions were clipped and
In this study, we proposed CNN based LeNet architecture
recorded to the same size. Also, the pixel values in the images
for facial expression recognition to estimate emotion states of
human. We merged 3 different datasets (KDEF, JAFFE and were converted to gray images size of 64x64 to be placed in
our custom dataset). Then, proposed LeNet architecture was neural networks. This process was done to avoid unnecessary
trained with final dataset for classification of 7 emotion states density in the neural networks.
(happy, sad, surprised, angry, disgust, afraid and neutral). The C. Convolutional Neural Network Architecture
aim of the study is to obtain deep learning model that achieve
higher accuracy rate for emotion recognition through facial With the proposed CNN architecture, it is aimed to educate
expression. the pixel values in the rectangular region containing facial
expressions quickly and functionally and to make quick queries
II. METHODS with the deep artificial neural network model formed. The
proposed CNN structure is summarized in Fig. 3. The network
A. Facial Expression Dataset mimics the LeNet structure used in classification of 2D facial
There are many open accesses facial expression dataset in expression data and includes the two convolutional layers, two
literature. We used 3 facial expression datasets. These are max-pooling layers, and one fully connected layer. The
JAFFE, KDEF and our custom dataset. convolutional layers with kernel size of 2x2 are stacked
together which are followed by max-pooling layer with kernel
JAFFE dataset contains 213 images with 7 facial size of 2x2 and stride of 2. After all operations of convolutional
expressions (happy, sad, surprised, angry, disgust, afraid and layers and max-pooling layers, each frame feeds to the fully
neutral). These images were taken from 10 Japanese female connected layers and prediction of frames was processed with
models [20]. An example of images from JAFFE dataset are Softmax classifier as seven different facial emotional state.
shown in Fig. 1.
D. Network Training
KDEF dataset contains 4900 images with 7 facial
expressions (happy, sad, surprised, angry, disgust, afraid and In training of network, test size determined as 25%. Batch
neutral). Participants are 35 males and 35 females. Dataset size has been set as 32 and epoch number was found as 500 to
contains 5 different angles. We used only straight position in converge parameters of network. Learning rate defined as 10-3.
this study [21]. An example of images from KDEF dataset are All kernel size defined as 2x2 with stride of 2 for convolutional
shown in Fig. 2. layers and max-pooling layers respectively. Number of
convolutional layers represented as 16 and 32 respectively.
Summary of proposed CNN architecture is shown in Table I.
E. Real Time Testing
After training of proposed CNN architecture, the trained
model was tested in real time. First of all, human faces were
detected with the Haar Cascade library within 30 images per
second of the computer camera. After that, the detected
images were sent to the model and the classes they belong to
were queried. As a result of the predictions, the possibility of
Fig.1. Example of images from JAFFE facial expression dataset. belonging to which class the facial expression was shown on a
Fig. 3. Proposed CNN model diagram for facial emotion recognition.

separate screen and the emotion in which class was higher was
overwritten on the Haar Cascade frame. This process was III. RESULT AND DISCUSSION
performed on every 30 frames that occurred every second of In this study, Keras and TensorFlow libraries were used
the camera image obtained in real time. for training LeNet CNN architecture and prediction of
emotion states with proposed deep learning model. Intel I7
TABLE I. SUMMARY OF PROPOSED CNN ARCHITECTURE
8300 CPU was used for all experiments and training custom
Layer (type)
dataset. Proposed LeNet CNN model was set with mentioned
Output Shape Param #
parameters. Fig. 4. shows performance metrics (training
conv2d_1 (Conv2D) (None, 64, 64, 20) 520 accuracy and training loss, validation accuracy and validation
loss) of proposed architecture during training and testing.
activation_1 (Activation) (None, 64, 64, 20) 0 According to experiment results, training loss was found
0.0887; training accuracy was found 96.43%; validation loss
max_pooling2d_1 (MaxPooling2) (None, 32, 64, 20) 0
was found 0.2725 and validation accuracy was found 91.81%.
conv2d_2 (Conv2D) (None, 32, 32, 50) 25050 When we look at our results, we get better results than
mentioned studies in introduction section [9,11,12,14,16].
activation_2 (Activation) (None, 32, 32, 50) 0

max_pooling2d_2 (MaxPooling2) (None, 16, 16, 50) 0

flatten_1 (Flatten) (None, 12800) 0

dense_1 (Dense) (None, 500) 6400500

activation_3 (Activation) (None, 500) 0

dense_2 (Dense) (None, 7) 3507

activation_4 (Activation) (None, 7) 0

Total params: 6,429,577


Trainable params: 6,429,577
Non-trainable params: 0

None

*Conv2D: 2D Convolutional Layer, Maxpooling2: 2D Max pooling Layer Fig. 4. Performance metrics of proposed architecture.
According to Fig. 5. confusion matrix, proposed LeNet REFERENCES
model more accurate at prediction of surprised, fear, neutral [1] M. Cabanac, "What is emotion?," Behavioural processes, vol. 60,
emotion states and less accurate at prediction of sad emotion pp. 69-83, 2002.
state. [2] R. Roberts, "What an Emotion Is: a Sketch," The Philosophical
Review, vol. 97, 1988.
[3] E. Shouse, "Feeling, emotion, affect," M/c journal, vol. 8, no. 6, p.
26, 2005.
[4] J. Zhao, X. Mao, and L. Chen, "Speech emotion recognition using
deep 1D & 2D CNN LSTM networks," Biomedical Signal
Processing and Control, vol. 47, pp. 312-323, 2019.
[5] J. M. B. Fugate, A. J. O'Hare, and W. S. Emmanuel, "Emotion
words: Facing change," Journal of Experimental Social
Psychology, vol. 79, pp. 264-274, 2018.
[6] J. P. Powers and K. S. LaBar, "Regulating emotion through
distancing: A taxonomy, neurocognitive model, and supporting
meta-analysis," Neuroscience & Biobehavioral Reviews, vol. 96,
pp. 155-173, 2019.
[7] R. B. Lopez and B. T. Denny, "Negative affect mediates the
relationship between use of emotion regulation strategies and
general health in college-aged students," Personality and Individual
Differences, vol. 151, p. 109529, 2019.
[8] S. Albanie, A. Nagrani, A. Vedaldi, and A. J. a. p. a. Zisserman,
"Emotion recognition in speech using cross-modal transfer in the
wild," pp. 292-301, 2018.
[9] K.-Y. Huang, C.-H. Wu, Q.-B. Hong, M.-H. Su, and Y.-H. Chen,
"Speech Emotion Recognition Using Deep Neural Network
Considering Verbal and Nonverbal Speech Sounds," in ICASSP
2019-2019 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2019, pp. 5866-5870: IEEE.
[10] M. Degirmenci, M. A. Ozdemir, R. Sadighzadeh, and A. Akan,
"Emotion Recognition from EEG Signals by Using Empirical
Mode Decomposition," in 2018 Medical Technologies National
Congress (TIPTEKNO), 2018, pp. 1-4.
[11] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based
Learning Applied to Document Recognition," Proceedings of the
Fig. 5. Confusion matrix of propoesed architecture. IEEE, vol. 86, pp. 2278-2324, 1998.
[12] T. Chang, G. Wen, Y. Hu, and J. Ma, "Facial Expression
Recognition Based on Complexity Perception Classification
IV. CONCLUSION Algorithm," arXiv e-prints, Accessed on: February 01, 2018
This paper proposed a low cost and functionality method Available:
https://ui.adsabs.harvard.edu/abs/2018arXiv180300185C
for real time classification seven different emotions by facial [13] K. Clawson, L. Delicato, and C. Bowerman, "Human Centric
expression based on LeNet CNN architecture. In this study, Facial Expression Recognition," 2018.
facial expression pictures, which can be said has a small [14] H.-D. Nguyen, S. Yeom, G.-S. Lee, H.-J. Yang, I. Na, and S. H.
number, were successfully trained in CNN and achieved high Kim, "Facial Emotion Recognition Using an Ensemble of Multi-
Level Convolutional Neural Networks," International Journal of
classification accuracy. Using the Haar Cascade library, the Pattern Recognition and Artificial Intelligence, 2018.
effect of unimportant pixels which is outside facial [15] T. Cao and M. Li, "Facial Expression Recognition Algorithm
expressions was reduced. In addition, single-depth placement Based on the Combination of CNN and K-Means," presented at the
of the pixels in the pictures to networks did not only result in Proceedings of the 2019 11th International Conference on Machine
Learning and Computing, Zhuhai, China, 2019.
loss of success rate, but also reduced training time and number [16] T. Ahmed, S. Hossain, M. Hossain, R. Islam, and K. Andersson,
of networks. Using a custom database has provided higher "Facial Expression Recognition using Convolutional Neural
validation and test accuracy than training in existing Network with Data Augmentation," pp. 1-17, 2019.
databases. The real-time test model has the functionality to [17] N. Christou and N. Kanojiya, "Human Facial Expression
Recognition with Convolution Neural Networks," Singapore, 2019,
query each image that occurs in every second. pp. 539-545: Springer Singapore.
[18] A. Sajjanhar, Z. Wu, and Q. Wen, "Deep learning models for facial
Emotion estimation from facial expressions is the area of
expression recognition," in 2018 Digital Image Computing:
interest of many researchers in the literature. It is hoped that Techniques and Applications (DICTA), 2018, pp. 1-6: IEEE.
this study will be a source of studies that will help in the early [19] J. Chen, Y. Lv, R. Xu, and C. Xu, "Automatic social signal
detection of diseases from facial expressions and also studies analysis: Facial expression recognition using difference
convolution neural network," Journal of Parallel and Distributed
of consumer behavior analysis.
Computing, vol. 131, pp. 97-102, 2019.
[20] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, "Coding
ACKNOWLEDGMENT Facial Expressions with Gabor Wavelets," pp. 200-205, 1998.
This work was supported by Scientific Research Projects [21] D. Lundqvist, A. Flykt, and A. Öhman, "The Karolinska directed
emotional faces (KDEF)," CD ROM from Department of Clinical
Coordinatorship of Izmir Katip Celebi University. Project Neuroscience, Psychology section, Karolinska Institutet, vol. 91, p.
number: 2019-ÖNAP-MÜMF-0003. 630, 1998.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy