0% found this document useful (0 votes)
17 views5 pages

ANN Based Facial Emotion Detection and Music Selection

The document presents a prototype for a dynamic music recommendation system that utilizes facial emotion detection to select music based on a user's mood. It employs machine learning techniques to recognize emotions from facial expressions and matches them with appropriate songs, aiming to automate the playlist creation process. The proposed system consists of two main modules: facial expression recognition and music recommendation, demonstrating notable performance in real-time emotion detection and music selection.

Uploaded by

sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

ANN Based Facial Emotion Detection and Music Selection

The document presents a prototype for a dynamic music recommendation system that utilizes facial emotion detection to select music based on a user's mood. It employs machine learning techniques to recognize emotions from facial expressions and matches them with appropriate songs, aiming to automate the playlist creation process. The proposed system consists of two main modules: facial expression recognition and music recommendation, demonstrating notable performance in real-time emotion detection and music selection.

Uploaded by

sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2022-IEEE International Interdisciplinary Humanitarian Conference for Sustainability (IIHC-2022), November 18th&19th 2022

ANN Based Facial Emotion Detection and Music


Selection
Dr. Merin Thomas Shreenidhi H S,
Associate Professor Asst. Prof.
Department of Computer Science and Engineering Department of Computer Science and Engineering
Faculty of Engineering and Technology Faculty of Engineering and Technology
Jain (Deemed-to-be University) Jain (Deemed-to-be University)
Bangalore, India, Bangalore, India,
merin.thomas@jainuniversity.ac.in hs.shreenidhi@jainuniversity.ac.in
Abstract:Everybody like to listen to music, depending on MER and MIR included these capabilities. Even though
their mood. But choosing a music manually based on mood is human voice and gesture are typical ways of
a task that has to be addressed because it takes time and communicating feelings, the most natural and ancient mer
2022 International Interdisciplinary Humanitarian Conference for Sustainability (IIHC) | 978-1-6654-5687-6/22/$31.00 ©2022 IEEE | DOI: 10.1109/IIHC55949.2022.10060593

effort. When anticipating a person's emotions and mood, the of doing so is through facial expression. Feelings,
face is crucial. We create a prototype for a dynamic music emotions, and mood may all be communicated through the
recommendation system based on human emotions in this face.
system that is being suggested. Songs for each emotion are
taught based on human listening patterns. The emotion on a Fear, disgust, anger, surprise, sad, happy and neutral,
real person's face is recognised using an integration of and a state of neutrality are the basic human emotions that
feature extraction and machine learning techniques. Once can be broken down into more specific categories. These
the mood is determined from the input image, the feelings can also be used as an umbrella term to describe a
appropriate music will be played to keep the users' attention. wide range of different states of mind, including concept
It consists of two stages: training and real-time emotion and cheerfulness, amongst others. These feelings are
recognition and music selection. The suggested study has perfectly suited for the situation. Because of this, facial
demonstrated a notable level of performance in terms of muscular contortions are quite subtle, yet being able to
recognition and music choice. differentiate between them can result in a wide range of
expressions. Because of how strongly an emotion is
Keywords—, emotion recognition, facial recognition.
influenced by its surrounding context, different people —
or even the same people — may express the same feeling
I. INTRODUCTION in different ways..
The human face is an important organ of the body, and Machine learning and neural networks have shown
it plays an especially vital function in determining an promising outcomes when applied to such categorised
individual's behaviour and emotional condition. It is a tasks. Machine learning algorithms have already been put
highly laborious, time-consuming, and time-consuming to good use in the fields of pattern identification and
operation to manually segregate the list of songs and classification; this suggests that they may also be
generate a suitable playlist depending on an individual's applicable to the detection of emotional states. Because of
emotional qualities. However, this is something that must the rise of digital music, it is crucial to create a system that
be done. can propose songs based on individual tastes.
A number of different algorithms have been proposed
and developed in order to automate the process of creating II. LITERATURE SURVEY
playlists. However, the proposed existing algorithms that Interaction with a human being is necessary for the
are now being used are computationally slow, have a lower traditional method[5] of playing music that corresponds to
level of accuracy, and at times even necessitate the usage a person's emotional state. The transition to technologies
of supplementary gear such as EEG or sensors. This based on computer vision will make the automation of
proposed method, which is based on the extraction of such systems possible. In order to do this task, an
facial expressions, will automatically build a playlist, algorithm is utilised to categorise human expressions, and
hence decreasing the amount of time and effort that are then a music track is played in accordance with the current
required to render the process manually. emotion that has been identified. It saves time and effort
The integration of Facial Emotion Recognition (FER) that would otherwise be required to manually browse
and Music Information Retrieval (MIR) into conventional through a collection of songs in order to find one that
music players made it possible to automatically categorise corresponds to a person's current mood. The facial features
the playlist according to a wide range of feelings and states of a person are extracted with the use of an algorithm
of mind. MER is a technique that is used to identify a called Principle Component Aanalysis and a classifier
facial extraction that has been received by taking into called Euclidean Distance. This allows the expressions of
consideration the numerous facial features of the extracted the person to be recognised. When compared to alternative
face and how they correspond to different categories of ways, the use of an integrated camera to record a person's
feelings and states of mind. Even though both MER and facial expressions results in a reduction in the amount of
MIR featured the capabilities of avoiding the manual money spent on the system's design.
segregation of songs and development of playlists, it is still A wide variety of human experiences and emotions can
unable to fully incorporate a human emotion-controlled be captured and understood through the medium of song.
music player. This is the case despite the fact that both Emotion-based classification systems that can be relied

978-1-6654-5687-6/22/$31.00 ©2022 IEEE


927
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:01:14 UTC from IEEE Xplore. Restrictions apply.
upon would greatly aid in understanding their significance. an entirely new and unknown set of images, it will be able
Studies attempting to categorise music according to how it to extract the position of facial landmarks from those
makes a listener feel have not been particularly successful images based on the knowledge that it had already
thus far. Users shouldn't have to spend a lot of time acquired from the training set and return the coordinates of
manually selecting and reordering songs when using a the new facial landmarks that it detected. These images
playlist to do so. Automatically creating a playlist based on will be used to train the classifier so that it will be able to
the user's mood is made possible by a new technology that do so successfully.
uses facial expressions to determine what kind of music
will make them feel good. • Mood Detection :expression of an emotion on the
face can be classified as either happy, angry, sad, neutral,
One of the most fascinating and mysterious aspects of surprise, fear or disgust. MobileNet, an architecture model
music [10] is its capacity to evoke feelings in its audience. for Image Classification and Mobile Vision, is utilised here
Music has the power to not only change the listener's for the purpose of carrying out this assignment. Running
emotional state but also their physical state. In this paper, MobileNet or applying transfer learning to it takes an
we'll look at a number of different classification-based extremely low amount of computation power. Because of
algorithms in order to outline a clear methodology for this, it is an excellent choice for mobile devices, embedded
doing two things: I categorising songs into four distinct systems, and computers with limited computing efficiency,
moods, and ii) detecting a user's mood based on facial without sacrificing the quality of the findings in any way.
expressions so that a playlist can be generated specifically Convolutions that are depth-wise separated are used in its
for that person. The geometry or principal promiment construction of lightweight deep neural networks.
points of key facial features like the lips and eyes were the Combining the FER 2013 dataset and the MMA Facial
only things taken into account by the feature extraction Expression Recognition dataset from Kaggle resulted in
method that relied on geometry. the creation of the dataset that was utilised for training
purposes. The FER 2013 dataset included grayscale images
Over the past few years, as a result of the development that were 48 pixels on a side and comprised 48 total pixels.
and application of big data, deep learning has garnered an The dataset used for MMA Facial Expression Recognition
increasing amount of attention. Convolutional neural contained photos that varied in terms of their criteria.
networks, which are a type of deep learning neural Therefore, all of these photos were processed in the same
networks, are an extremely significant component in facial mer as the images in the FER 2013 dataset, and then they
picture identification. In this paper, a combination of the were pooled to produce an even larger dataset that included
micro-expression identification technology of 20,045 images for training and 7,724 images for testing. In
convolutional neural networks and the automatic music order to train and validate our model, MobileNet was
recommendation algorithm is built to identify a model that combined with Keras. for seven classes -happy, angry,
identifies facial micro-expressions and suggests music neutral, sad, surprise, fear and disgust. The proposed block
according to related moods. diagram is shown in the figure 1.
A novel artificial neural network based facial emotion
detection system have been proposed in the real time and
based on the mood of the person and music are
recommended based on the detection of the mood.

III. PROPOSED SYSTEM


Real-time mood recognition is the main goal of the
application known as the mood-based music
recommendation system. It is a prototype for a brand-new
product that has two primary modules: music suggestion
and facial expression recognition/mood detection.

A. FACIAL MOOD DETECTION


This process consists of the two stages:
• Face Detection :
A camera is utilised in order to record an image of the
Fig 1:Facial Mood training and recognition
user as they appear in real time. Once the photo has been
taken, the frame of the image that was acquired from the
webcam feed is transformed to a grayscale image in order B. MUSIC RECOMMENDATION MODULE
to increase the performance of the classifier that is used to Both Hindi and English versions of the dataset
identify the face that is present in the picture. After the including songs that had been categorised according to
conversion has been finished, the image is delivered to the their mood were discovered on Kaggle. An investigation
classifier algorithm, which, with the assistance of feature was carried out to find a reliable cloud storage platform
extraction techniques, may extract the face from the frame that could save, retrieve, and query this music data in
of the web camera stream. The extracted face is broken response to specific user requests.Once the facial
down into its component parts, which are then fed into an expression is identified by classifier, it is considered as the
artificial neural network () model that has been trained to mood of the person, based on those songs stored in the
identify the user's emotional state. The classifier will be database or selected and played as shown in the figure 2.
trained using these images so that when it is presented with

928
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:01:14 UTC from IEEE Xplore. Restrictions apply.
Fig 2: Music recommendation Module

IV. SYSTEM ARCHITECTURE


flow diagram represents the functional formation of the
music player as shown in Figure 3.
Initially the model is trained by using the input dataset
and the preprocessing model takes place and then the
model trains the dataset and then it tests the dataset and
captures the facial expression using the webcommand the
emotion detection is formed using themodel. The model
classifies the facial expressions whereas, happy, sad,
angry, depressed etc. and it analyses the exact emotion and
plays the music. A behavioral diagram is the same thing as
an activity diagram, which means that it displays the
Figure 3: Proposed framework
behavior of a system. An activity diagram depicts the
control flow from a starting point to an ending point,
V. IMPLEMENTATION
outlining the many decision routes that are available while
the activity is being carried out and demonstrating how For the purposes of training our ANN architecture, we
they connect to one another. collected data from the FER2013 database. It was
developed with the help of the Google image search API,
and it was presented at the ICML 2013 Challenges. The
sizes of the faces in the database have been automatically
standardized to 4848 pixels. The FER2013database
comprises a total of 35887 photos tagged with
7expressions (28709 training images, 3589validation
images, and 3589 test images). There are a set number of
pictures that are used to depict each feeling. What it all
means, according to ANN (ANN): We took live frames
from a web camera by utilizing the Open CV library, and
then we used the HarrCascades approach to identify the

929
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:01:14 UTC from IEEE Xplore. Restrictions apply.
faces of the pupils. Adaboost is the learning algorithm that CONCLUSION
forms the foundation of Haar Cascades. An important step forward in understanding human
behaviour has been developed as part of the proposed
Algorithm: effort. 7 common human emotions are explored here. An
Emotion-Based Music Player satisfies the need to classify
Training phase musical selections in accordance with emotional states.
Select the FER2013 database Both a training and a testing phase make up the suggested
system. In the training phase, the Adaboost learning
Preprocess the Dataset algorithm is utilised to train on the FER2013 database.
Read the images of N x N HarrCascades is used to analyse face expressions in real
time, allowing for the identification of seven distinct
Resize the images human emotions and the subsequent selection of an
Extract the facial features appropriate soundtrack.

Select training set of M number of sample images REFERENCES


Train the ANN model using
Adaboost learning algorithm [1] D.Schnaufer,B.Peterson,"Realizing5Gsub-6-
GHzmassiveMIMOusing GaN," Microwave. & RF, pp. 4–8, 2018.
Testing Phase [2] E.O’Connelletal.,"Challengesassociatedwithimplementing5Ginman
ufacturing," Telcom, Vol. 1, no. 1, pp. 48-67, 2020.
Capture the real-time image using webcam [3] C.A.Balanis,AntennaTheoryAnalysisandDesign,Fourthedition.,Hob
oken, New Jersey, John Wiley & Sons, Inc., 2016.
Process the image to extract the facial characteristics [4] A.Tagguetal.,"Adualbandomni-directionalantennaforWAVEandWi-
Fi,"20172ndInternationalConferenceonCommunicationSystems,Co
Detect the emotion and using HarrCascades mputing and IT Applications (CSCITA),IEEE, pp. 1-4, 2017.
Predict the emotions [5] D. Nataraj, G. Karunakar, ―Design and research of miniaturized
microstrip slot with and without defected ground structure,‖ Int. J.
Select the songs based on user interest and predicted Recent Technol. Eng., vol. 8, no. 2, pp. 391–398, 2019.
emotions. [6] Y. Liu et al., "Some recent developments of microstrip antenna,"
Interna- tional Journal of Antennas and Propagation, vol. 2012,
2012.
VI. EXPERIMENT RESULT AND DISCUSSION [7] R. Garg et al., Microstrip Antenna Design Handbook, Artech
house, 2001.
In the work that has been proposed, the Adaboost [8] A. Pandey, Practical Microstrip and Printed Antenna Design,
learning method has been built to select a number of Artech House, 2019.
significant features from a huge set, and the Image Data [9] R. Mondal et al., "Compact ultra-wideband antenna: improvement
Generator class in the Keras library has been implemented of gain and FBR across the entire bandwidth using FSS," IET
Microwaves, Antennas & Propagation, vol. 14, no. 1, pp. 66-74,
to do image augmentation. 2019.
These parameters are utilized in the process: rotation [10] Z. Xing, K. Wei,L. Wang, and J. Li, ―Dual-band RFIDantenna for
range = 10, width shift range = 0.1, zoom range = 0.1, 0.92 GHz near-field and 2.45 GHz far-
fieldapplications,‖InternationalJournalofAntennasandPropagation,
height shift range = 0.1, and horizontal flip = True. The vol. 2017,pp. 1-9,2017.
ANN model that is utilized has a total of two fully [11] A. Roy, S. Bhunia, D. C. Sarkar, and P. P. Sarkar, ―Slotloaded
connected layers, four pooling layers, and four compact microstrip patch antenna for dual bandoperation,‖
convolutional layers. Additionally, the ReLU function is Progress in Electromagnetics Research C, vol.73,pp.145–156,
used to provide non-linearity in the ANN model, in 2017.
addition to batch normalization to normalize the activation [12] M. M. Hasan, M. R. I. Faruque, and M. T. Islam,
of the preceding layer at each batch and L2 regularization ―DualbandmetamaterialantennaforLTE/Bluetooth/WiMAXsystem,‖
to apply penalties to the model's various parameters. Both ScienceReports,vol.8,no.1240,pp.1-17,2018.
of these regularization techniques are used in conjunction [13] A. Salam, A. A. Khan, and M. S. Hussain, ―Dual bandmicrostrip
with the ANN model. The Convolutional Neural Network antenna for wearable applications,‖ Microwaveand Optical
Technology Letters, vol. 56, no. 4, pp. 916-918,2014.
model is trained using the FER 2013 database, which
[14] M.K.Khandelwal,B.K.Kanaujia,andS.Kumar,―Defected ground
includes seven emotions (happiness, anger, sadness, structure: Fundamentals, analysis,
disgust, neutral, fear and surprise) Before being input into andapplicationsinmodernwirelesstrends,‖InternationalJournal of
the ANN model, the pictures of the detected faces were Antennas and Propagation, vol. 2017, pp. 1-22,2017.
rescaled to be 4848 pixels wide and converted to grayscale. [15] F. Y. Zulkifli, E. T. Rahardjo, and D. Hartanto,
―Mutualcouplingreductionusingdumbbelldefectedgroundstructurefo
Table 1 gives the confusion matrix of the recognition of rmultibandmicrostripantennaarray,‖Progress In Electromagnetics
the 7 class of the facial expression. Research Letters, vol. 13,pp.29-40, 2010.
[16] S. S. Kumar, S. H. Bharathi and M. Archana, "Non-negative matrix
Based on the recognition of the mood of the person, the
based optimization scheme for blind source separation in automatic
songs will be selected and played. speech recognition system," 2016 International Conference on
Communication and Electronics Systems (ICCES), Coimbatore,
2016, pp. 1-6, doi: 10.1109/CESYS.2016.7889860.
[17] N. P. Yadav, ―Plus shaped notch loaded rectangular patchwith
D.G.S. antenna for multiband operation,‖ in Proc.Antenna Test &
Measurement Society (ATMS India-16)2016,pp. 1-5.

930
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:01:14 UTC from IEEE Xplore. Restrictions apply.
[18] D.Fistum,D.Mali,andM.Ismail,―Bandwidthenhancementofrectangul
armicrostrippatchantennausing defected ground structure,‖
Indonesian Journal ofElectrical Engineering and Computer
Science, vol. 3, no.2, pp.428-434,2016.
[19] A. Zaidi et al., "High gain microstrip patch antenna, with PBG
substrateand PBG cover, for millimetre wave applications," 2018
4th InternationalConference on Optimization and Applications
(ICOA), IEEE, pp. 1-6,2018.
[20] S. S. Kumar, B. K. Aishwarya, K. N. Bhanutheja and M. Chaitra,
"Breath to speech communication with fall detection for
elder/patient with take care analytics," 2016 IEEE International
Conference on Recent Trends in Electronics, Information &
Communication Technology (RTEICT), Bangalore, 2016, pp. 527-
531, doi: 10.1109/RTEICT.2016.7807877.

Neutral Sad Fear Anger Disgust Happy Surprise


(%) (%) (%) (%) (%) (%) (%)
Neutral 92.35 1.012 2.01 2.125 1.385 0 1.118
Happy 2.15 0 0.25 0.127 0 97.25 0.223
Surprise 0 1.2 0.012 0 0.005 1.2 97.583
Fear 1.2 0.825 95.023 0 1.75 0 1.202
Anger 2.53 1.3 1.385 93.432 0.123 0 1.23
Sad 3.234 92.85 1.35 0.85 0.12 1.49 0.106
Disgust 0.025 1.0123 0.356 0.61 97.7077 0 0.289

Average Recognition = 95.17081%

931
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:01:14 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy