An Efficient Model For Facial Expression Recognition With Music Recommendation
An Efficient Model For Facial Expression Recognition With Music Recommendation
https://doi.org/10.1007/s40009-023-01346-4
SHORT COMMUNICATION
Abstract An AI interactive robot can identify human of our suggested model was compared to several baseline
faces, determine the emotions of the person it is chatting approaches, and the results were quite affirmative. Anger,
with, and then pick appropriate replies using algorithms fear, pleasure, neutral, sorrow, and surprise are the six feel-
that analyze facial expressions and recognize faces. One ings that our CNN model can predict.
example of these algorithms is facing recognition and emo-
tion recognition algorithms. Deep learning is currently the Keywords Face recognition · Face expression
most effective method for carrying out tasks. We have devel- recognition · Artificial intelligence · CNN · Deep learning
oped a real-time system that can recognize human faces,
determine human emotions, and even provide users with
music recommendations by utilizing deep learning and a Facial expressions reflect the person’s state of mind as they
few Python modules. The OAHEGA and FER-2013 data- mirror the emotions impeccably and are kingpins in nonver-
sets train the models presented in this article. The accuracy bal human communication. Charles Darwin proposed the
concept of universal emotions as he argued that emotional
experiences are hardwired into every human being [1]. Sev-
Significance statement: In this paper, the proposed approach
will predict a person’s emotion and recognize a person’s face
eral works in the past [2–5] have utilized the study of facial
using the facial expressions of the person as well as recommend expressions in different application perspectives. Reference
music to the person based on the detected emotion of the person. [6] utilized face recognition and emotion recognition out-
This system can be of great use in several applications. put for a humanoid robot using convolution neural networks
(CNN).
* Brijesh Bakariya
brijeshmanit@gmail.com We propose a real-time system consisting of three phases.
In the first phase, we use the Haar cascade technique [7] to
Krishna Kumar Mohbey
kmohbey@gmail.com locate human faces. The "haarcascade_frontalface_default.
xml" is the face detection template that serves the purpose.
Arshdeep Singh
ishir.sagoo@gmail.com We also use OpenCV to connect the camera to capture the
image before bringing in the Haar cascade classifier. Then,
Harmanpreet Singh
singhharmanpreet21@gmail.com the image is converted to grayscale, and the detectMulti-
Scale method is used to locate faces of varying sizes within
Pankaj Raju
rajupankaj20@gmail.com the input image. If a face is detected, the image is shown
in RGB, and the cv2.rectangle function draws a rectangle
Rohit Rajpoot
rohitrajpoot7696@gmail.com with the coordinates (x,y), width ("w"), and height (“h”),
1 otherwise, "No Face Detected." is prompted on the screen.
Department of Computer Science and Engineering,
I.K.Gujral Punjab Technical University Campus, Hoshiarpur, We use cv2.imread(img) function to capture intricacies, and
Punjab, India then with the use of face recognition encoding, the images
2
Department of Computer Science, Central University are processed and compared against other encoded images
of Rajasthan, Ajmer, India to determine whether the face is already recognized, or the
13
Vol.:(0123456789)
B. Bakariya et al.
current instance is the detected face. At the GUI, the user optimizer = Adam. The confusion matrix for the process is
is enabled with the "Add the person’s face" button to add depicted in Table 1.
the recognized face to the database for future reference and In the third phase, the algorithm was developed to ana-
reduce the time consumed. The entire database is read and lyze the emotions for some threshold time and extract the
encoded only once at the setup time and not continuously. predominant emotion. Before this, a separate CSV filename
The newly recognized faces are encoded and appended to after the different emotions under study is built with suitable
the list with an alphabetic order-based index. The detected songs. This forms our music database and Python library
face image is reduced to 48 by 48 pixels and is now ready pygame3, and a mixer sub-library is used to monitor the
for the next phase. music playback. Once the user initiates the "Suggest a Song"
In the second phase, we develop a CNN model that can operation on the graphical user interface (GUI), the algo-
automatically and adaptively learn spatial hierarchies of rithm uses the recognized emotion to select the song ran-
features, moving from low-level to high-level patterns in domly from the CSV file. Parallel, the pygame music mixer
data having a grid pattern, for facial emotion recognition is started, and the relevant song is played. This parallel pro-
as proposed in [8]. Thus, we combine the FER-20131and cess can be controlled using the appropriate GUI. The entire
OAHEGA [9] datasets and obtain 43,003 images for training process is depicted in Fig. 2.
and 8856 images for testing, respectively, for six emotions: The model consisted of 12 twelve layers 32–64–64–128
anger, fear, happy, neutral, sad, and surprise. The developed kernels, 19,011,142 trainable parameters. For an image of
model has 12 layers. The initial layers are the input layers, size M by N pixels and CNN with k kernels, the estimated
which receive images 48 by 48 pixels from an input channel. complexity is O(MNk2). It consumed 0.70252 s and achieved
The convolution and pooling layers perform feature extrac- 0.9413 as training accuracy with a loss of 0.1687 and 0.7302
tion, and the fully connected layer classifies the emotion on as validation accuracy with a loss of 1.0412. The technique
the face, as illustrated in Fig. 1. proposed in [10] has an accuracy of 0.65 and in [11] has
The hyperparameters were tuned as learning rate an accuracy of 0.68. This was surpassed by the ResNet-18-
(lr) = 0.0001, decay = 1e-6, batch size = 32, epochs = 24, and based method proposed in [12] which achieved an accuracy
of 0.71. Using smoothed deep neural network ensemble
technique in Ref. [13] achieved an accuracy of 0.72. Our
proposed model surpasses these methods and has an accu-
1
https://www.kaggle.com/datasets/msambare/fer2013. racy of 0.732.
13
An Efficient Model for Facial Expression Recognition with Music Recommendation
13
B. Bakariya et al.
2. Boragule A, Akram H, Kim J, Jeon M (2022) Learning to resolve 9. Kovenko V, Shevchuk V (2021) OAHEGA: emotion recogni-
uncertainties for large-scale face recognition. Pattern Recogn Lett tion dataset. Mendeley Data, V2. https://doi.org/10.17632/5ck5z
160:58–65 z6f2c.2.
3. Basha SM, Rajput DS (2018) Parsing based sarcasm detec- 10. Meena G, Mohbey KK, Indian A, Kumar S (2022) Sentiment anal-
tion from literal language in tweets. Recent Patents Comp Sci ysis from images using vgg19 based transfer learning approach.
11(1):62–69 Proc Comp Sci 204:411–418
4. Basha SM, Rajput DS, Thabitha TP, Srikanth P, Pavan Kumar 11. Yang L, Zhang H, Li D, Xiao F, Yang S (2021) Facial expression
CS (2019) Classification of sentiments from movie reviews using recognition based on transfer learning and SVM. J Phys Conf Ser
KNIME. In: Proceedings of the 2nd international conference 2025(1):012015
on data engineering and communication technology: ICDECT 12. Benamara NK, Val-Calvo M, Alvarez-Sanchez JR, Diaz-Morcillo
2017, pp 633–639. Springer, Singapore. A, Ferrandez-Vicente JM, Fernandez-Jover E, Stambouli TB
5. Varshney N, Bakariya B, Kushwaha AKS (2022) Human activity (2021) Real-time facial expression recognition using smoothed
recognition using deep transfer learning of cross position sensor deep neural network ensemble. Integr Comput Aided Eng
based on vertical distribution of data. Multimed Tools and Appl 28(1):97–111
81(16):22307–22322 13. Tai Y, Tan Y, Gong W, Huang H (2021) Bayesian convolutional
6. Dwijayanti S, Iqbal M, Suprapto BY (2022) Real-time implemen- neural networks for seven basic facial expression classifica-
tation of face recognition and emotion recognition in a human- tions. arXiv preprint arXiv:2107.04834.
oid robot using a convolutional neural network. IEEE Access
10:89876–89886 Publisher’s Note Springer Nature remains neutral with regard to
7. Viola P, Jones M (2001) Rapid object detection using a boosted jurisdictional claims in published maps and institutional affiliations.
cascade of simple features. In: Proceedings of the 2001 IEEE
computer society conference on computer vision and pattern rec-
ognition. CVPR 2001, vol 1, pp. I-I. IEEE, New York.
8. Varshney N, Bakariya B (2021) Deep convolutional neural model
for human activities recognition in a sequence of video by com-
bining multiple CNN streams. Multimed Tools Appl, 1–13.
13