Real Time Emotion Recognition From Facial Expressions Using CNN Architecture
Real Time Emotion Recognition From Facial Expressions Using CNN Architecture
Abstract—Emotion is an important topic in different fields hidden layers. These artificial neurons take input from image,
such as biomedical engineering, psychology, neuroscience and multiply weight, add bias and then apply activation function.
health. Emotion recognition could be useful for diagnosis of brain So that, artificial neurons can be used in image classification,
and psychological disorders. In recent years, deep learning has recognition, and segmentation by perform simple
progressed much in the field of image classification. In this study, convolutions. By feeding the convolutional neural network
we proposed a Convolutional Neural Network (CNN) based with more data (huge amount of data), a better and highly
LeNet architecture for facial expression recognition. First of all, accurate deep learning model can be achieved.
we merged 3 datasets (JAFFE, KDEF and our custom dataset).
Then we trained our LeNet architecture for emotion states Deep learning based facial expression recognition is one of
classification. In this study, we achieved accuracy of 96.43% and these methods to detect emotion state (e.g., anger, fear,
validation accuracy of 91.81% for classification of 7 different neutral, happiness, disgust, sadness and surprise) of human.
emotions through facial expressions. This method aims to detect facial expressions automatically to
identify emotional state with high accuracy. In this method,
Keywords—Convolutional Neural Network; Deep Learning; labeled facial images from facial expression dataset are sent to
Emotion Recognition; Facial Expressions, Real Time Detection. CNN and CNN is trained by these images. Then, proposed
CNN model makes a determination which facial expression is
I. INTRODUCTION performed.
Although there are many studies in the literature on
emotion, there is no common or singular definition in the Chang et al. used CNN model based on ResNet to extract
literature about emotion[1]. Emotion is the appearance or feature from Fer2013 and CK+ dataset. Proposed complexity
reflection of a feeling. Distinct from feeling, emotion can be perception classification algorithm (CPC) was applied with
either real or sham. For example, feeling of pain can directly different classifiers (Softmax, LinearSVM, and
represent the feeling. But emotions are not felt exactly. RandomForest). CNN+Softmax with CPC has achieved
Emotions present inner situations psychologically [2, 3]. 71.35% and 98.78% recognition accuracies for Fer2013 and
CK+ respectively [12].
Emotion is an important, complex and extensive research
topic in the fields of biomedical engineering [4], psychology Clawson et al. proposed two human centric CNN
[5], neuroscience [6] and health [7]. Emotion detection is an architecture for facial expression recognition on CK+ dataset.
important research area in biomedical engineering. Studies in CNN A consists of 1 convolutional layer and 1 max pooling
this area focus on predicting human emotion and computer- layer. CNN B consists of 2 convolutional layers and 2 max
assisted diagnosis of psychological disorders. There are pooling layers. These architectures trained with 0.0001 initial
different methods in literature to detect emotional states such learning rate, 300 epochs and 10 batch size. According to
as electroencephalography (EEG), galvanic skin response results, proposed model has achieved 93.3% accuracy on CK+
(GSR), speech analysis, facial expression, multimodal, visual images [13].
scanning behavior [8-10]. Nguyen et al. proposed multi-level18-layer CNN model
In recent years, with the popularization of deep learning, similar to VGG. These model does not take only high-level
great progress has been made in image classification. features also takes mid-level features. Plain CNN model has
Convolutional neural networks (CNNs) is an artificial neural reached 69.21% accuracy and proposed multi-level CNN
network type that proposed by Yann LeChun in 1988 [11]. model has reached 73.03% accuracy on Fer2013 dataset [14].
Convolutional neural networks are one of the most popular Cao et al. proposed CNN model with K-means clustering
deep learning architectures for image classification, idea and SVM classifier which has achieved 80.29% accuracy.
recognition, and segmentation. K-means clustering model determines initial value of the
Convolutional neural networks built like a human brain convolution kernel of CNN. SVM layers takes features from
with artificial neurons and consist of hierarchical multiply trained CNN model to classify Fer2013 images [15].
separate screen and the emotion in which class was higher was
overwritten on the Haar Cascade frame. This process was III. RESULT AND DISCUSSION
performed on every 30 frames that occurred every second of In this study, Keras and TensorFlow libraries were used
the camera image obtained in real time. for training LeNet CNN architecture and prediction of
emotion states with proposed deep learning model. Intel I7
TABLE I. SUMMARY OF PROPOSED CNN ARCHITECTURE
8300 CPU was used for all experiments and training custom
Layer (type)
dataset. Proposed LeNet CNN model was set with mentioned
Output Shape Param #
parameters. Fig. 4. shows performance metrics (training
conv2d_1 (Conv2D) (None, 64, 64, 20) 520 accuracy and training loss, validation accuracy and validation
loss) of proposed architecture during training and testing.
activation_1 (Activation) (None, 64, 64, 20) 0 According to experiment results, training loss was found
0.0887; training accuracy was found 96.43%; validation loss
max_pooling2d_1 (MaxPooling2) (None, 32, 64, 20) 0
was found 0.2725 and validation accuracy was found 91.81%.
conv2d_2 (Conv2D) (None, 32, 32, 50) 25050 When we look at our results, we get better results than
mentioned studies in introduction section [9,11,12,14,16].
activation_2 (Activation) (None, 32, 32, 50) 0
None
*Conv2D: 2D Convolutional Layer, Maxpooling2: 2D Max pooling Layer Fig. 4. Performance metrics of proposed architecture.
According to Fig. 5. confusion matrix, proposed LeNet REFERENCES
model more accurate at prediction of surprised, fear, neutral [1] M. Cabanac, "What is emotion?," Behavioural processes, vol. 60,
emotion states and less accurate at prediction of sad emotion pp. 69-83, 2002.
state. [2] R. Roberts, "What an Emotion Is: a Sketch," The Philosophical
Review, vol. 97, 1988.
[3] E. Shouse, "Feeling, emotion, affect," M/c journal, vol. 8, no. 6, p.
26, 2005.
[4] J. Zhao, X. Mao, and L. Chen, "Speech emotion recognition using
deep 1D & 2D CNN LSTM networks," Biomedical Signal
Processing and Control, vol. 47, pp. 312-323, 2019.
[5] J. M. B. Fugate, A. J. O'Hare, and W. S. Emmanuel, "Emotion
words: Facing change," Journal of Experimental Social
Psychology, vol. 79, pp. 264-274, 2018.
[6] J. P. Powers and K. S. LaBar, "Regulating emotion through
distancing: A taxonomy, neurocognitive model, and supporting
meta-analysis," Neuroscience & Biobehavioral Reviews, vol. 96,
pp. 155-173, 2019.
[7] R. B. Lopez and B. T. Denny, "Negative affect mediates the
relationship between use of emotion regulation strategies and
general health in college-aged students," Personality and Individual
Differences, vol. 151, p. 109529, 2019.
[8] S. Albanie, A. Nagrani, A. Vedaldi, and A. J. a. p. a. Zisserman,
"Emotion recognition in speech using cross-modal transfer in the
wild," pp. 292-301, 2018.
[9] K.-Y. Huang, C.-H. Wu, Q.-B. Hong, M.-H. Su, and Y.-H. Chen,
"Speech Emotion Recognition Using Deep Neural Network
Considering Verbal and Nonverbal Speech Sounds," in ICASSP
2019-2019 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2019, pp. 5866-5870: IEEE.
[10] M. Degirmenci, M. A. Ozdemir, R. Sadighzadeh, and A. Akan,
"Emotion Recognition from EEG Signals by Using Empirical
Mode Decomposition," in 2018 Medical Technologies National
Congress (TIPTEKNO), 2018, pp. 1-4.
[11] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based
Learning Applied to Document Recognition," Proceedings of the
Fig. 5. Confusion matrix of propoesed architecture. IEEE, vol. 86, pp. 2278-2324, 1998.
[12] T. Chang, G. Wen, Y. Hu, and J. Ma, "Facial Expression
Recognition Based on Complexity Perception Classification
IV. CONCLUSION Algorithm," arXiv e-prints, Accessed on: February 01, 2018
This paper proposed a low cost and functionality method Available:
https://ui.adsabs.harvard.edu/abs/2018arXiv180300185C
for real time classification seven different emotions by facial [13] K. Clawson, L. Delicato, and C. Bowerman, "Human Centric
expression based on LeNet CNN architecture. In this study, Facial Expression Recognition," 2018.
facial expression pictures, which can be said has a small [14] H.-D. Nguyen, S. Yeom, G.-S. Lee, H.-J. Yang, I. Na, and S. H.
number, were successfully trained in CNN and achieved high Kim, "Facial Emotion Recognition Using an Ensemble of Multi-
Level Convolutional Neural Networks," International Journal of
classification accuracy. Using the Haar Cascade library, the Pattern Recognition and Artificial Intelligence, 2018.
effect of unimportant pixels which is outside facial [15] T. Cao and M. Li, "Facial Expression Recognition Algorithm
expressions was reduced. In addition, single-depth placement Based on the Combination of CNN and K-Means," presented at the
of the pixels in the pictures to networks did not only result in Proceedings of the 2019 11th International Conference on Machine
Learning and Computing, Zhuhai, China, 2019.
loss of success rate, but also reduced training time and number [16] T. Ahmed, S. Hossain, M. Hossain, R. Islam, and K. Andersson,
of networks. Using a custom database has provided higher "Facial Expression Recognition using Convolutional Neural
validation and test accuracy than training in existing Network with Data Augmentation," pp. 1-17, 2019.
databases. The real-time test model has the functionality to [17] N. Christou and N. Kanojiya, "Human Facial Expression
Recognition with Convolution Neural Networks," Singapore, 2019,
query each image that occurs in every second. pp. 539-545: Springer Singapore.
[18] A. Sajjanhar, Z. Wu, and Q. Wen, "Deep learning models for facial
Emotion estimation from facial expressions is the area of
expression recognition," in 2018 Digital Image Computing:
interest of many researchers in the literature. It is hoped that Techniques and Applications (DICTA), 2018, pp. 1-6: IEEE.
this study will be a source of studies that will help in the early [19] J. Chen, Y. Lv, R. Xu, and C. Xu, "Automatic social signal
detection of diseases from facial expressions and also studies analysis: Facial expression recognition using difference
convolution neural network," Journal of Parallel and Distributed
of consumer behavior analysis.
Computing, vol. 131, pp. 97-102, 2019.
[20] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, "Coding
ACKNOWLEDGMENT Facial Expressions with Gabor Wavelets," pp. 200-205, 1998.
This work was supported by Scientific Research Projects [21] D. Lundqvist, A. Flykt, and A. Öhman, "The Karolinska directed
emotional faces (KDEF)," CD ROM from Department of Clinical
Coordinatorship of Izmir Katip Celebi University. Project Neuroscience, Psychology section, Karolinska Institutet, vol. 91, p.
number: 2019-ÖNAP-MÜMF-0003. 630, 1998.