Emotion Classification Using ML and DL
Emotion Classification Using ML and DL
a
Drashti Kher and Kalpdrum Passi
School of Engineering and Computer Science, Laurentian University, Sudbury, Ontario, Canada
Keywords: Multi-label Emotion Classification, Twitter, Python, Deep Learning, Machine Learning, Naïve Bayes, SVM,
Random Forest, KNN, GRU based RNN, Ensemble Methods, One-way ANOVA.
Abstract: Emotion detection in online social networks benefits many applications like personalized advertisement
services, suggestion systems etc. Emotion can be identified from various sources like text, facial expressions,
images, speeches, paintings, songs, etc. Emotion detection can be done by various techniques in machine
learning. Traditional emotion detection techniques mainly focus on multi-class classification while ignoring
the co-existence of multiple emotion labels in one instance. This research work is focussed on classifying
multiple emotions from data to handle complex data with the help of different machine learning and deep
learning methods. Before modeling, first data analysis is done and then the data is cleaned. Data pre-
processing is performed in steps such as stop-words removal, tokenization, stemming and lemmatization, etc.,
which are performed using a Natural Language Processing toolkit (NLTK). All the input variables are
converted into vectors by naive text encoding techniques like word2vec, Bag-of-words, and term frequency-
inverse document frequency (TF-IDF). This research is implemented using python programming language.
To solve multi-label emotion classification problem, machine learning, and deep learning methods were used.
The evaluation parameters such as accuracy, precision, recall, and F1-score were used to evaluate the
performance of the classifiers Naïve Bayes, support vector machine (SVM), Random Forest, K-nearest
neighbour (KNN), GRU (Gated Recurrent Unit) based RNN (Recurrent Neural Network) with Adam
optimizer and Rmsprop optimizer. GRU based RNN with Rmsprop optimizer achieves an accuracy of 82.3%,
Naïve Bayes achieves highest precision of 0.80, Random Forest achieves highest recall score of 0.823, SVM
achieves highest F1 score of 0.798 on the challenging SemEval2018 Task 1: E-c multi-label emotion
classification dataset. Also, One-way Analysis of Variance (ANOVA) test was performed on the mean values
of performance metrics (accuracy, precision, recall, and F1-score) on all the methods.
1 INTRODUCTION Emotion can be detected from the data with the help
of data mining techniques, machine learning
With the increasing popularity of online social media, techniques and with the help of neural networks
people like expressing their emotions or sharing (Avetisyan, H and Bruna, Ondej and Holub, Jan, 2016).
meaningful events with other people on the social From the examination it was expressed that emotion
network platforms such as twitter, Facebook, detection approaches can be classified into three
personal notes, blogs, novels, emails, chat messages, following types: keyword based or lexical based,
and news headlines (Xiao Zhang, Wenzhong Li1 and learning based and hybrid. The most commonly used
Sanglu Lu, 2017). classifiers, such as SVM, naive bayes and hybrid
Emotion is a strong feeling that deriving from algorithms (Avetisyan, H and Bruna, Ondej and Holub,
person's mood or interactions with each other. Many Jan, 2016). Emotion mining is very interesting topic in
ways are available for detecting emotions from the many studies such as cognitive science, neuroscience,
textual data, for example social media has made our and psychology (Yadollahi, Ali and Shahraki, Ameneh
life easier and by pressing just one button everyone Gholipour and Zaiane, Osmar R, 2017). Whereas emotion
can share personal opinion with the whole world. mining from text is still in its early stages and still has
a
https://orcid.org/0000-0002-7155-7901
128
Kher, D. and Passi, K.
Multi-label Emotion Classification using Machine Learning and Deep Learning Methods.
DOI: 10.5220/0011532400003318
In Proceedings of the 18th International Conference on Web Information Systems and Technologies (WEBIST 2022), pages 128-135
ISBN: 978-989-758-613-2; ISSN: 2184-3252
Copyright c 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
Multi-label Emotion Classification using Machine Learning and Deep Learning Methods
a long way to proceed, developing systems that can 1.1.1 Machine Learning based Approach
detect emotions from text has many applications.
The intelligent tutoring system can decide on For the machine learning models, data cleaning,
teaching materials, based on users mental state and text preprocessing, stemming, and lemmatization
feelings in E-learning applications. The computer can on the raw data were performed. The text data was
monitor users emotions to suggest appropriate music transformed to vectors by using the TF-IDF
or movies in human computer interaction (Yadollahi, method, then multiple methods were used-to
Ali and Shahraki, Ameneh Gholipour and Zaiane, Osmar R, predict each emotion. SVM, Naive Bayes, Random
2017). Moreover, output of an emotion-mining system Forest, and KNN classifiers were used extensively
can serve as input to the other systems. For instance, to build the machine learning solution. After all the
Rangel and Rosso (Yadollahi, Ali and Shahraki, Ameneh training, various performance metrics measures
Gholipour and Zaiane, Osmar R, 2017 )( Rangel and were plotted for each model concerning every
Paolo Rosso,2016) use the emotions identified within emotion label as a bar plot.
the text for author identification, particularly
identifying the writers age and gender. Lastly, 1.1.2 Deep Learning based Approach
however not the least, psychologists can understand
patients emotions and predict their state of mind For the deep learning, dataset is loaded, then
consequently. On a longer period of time, they are preprocessed, and encoded to perform deep learning
able to detect if a patient is facing depression, stress techniques on it. From this research shows that RNN
that is extremely helpful since he/she can be referred based model performs well on text data, GRU model
to counselling services (Yadollahi, Ali and Shahraki, was built with an attention mechanism to solve the
Ameneh Gholipour and Zaiane, Osmar R, 2017). There is problem by training for multiple epochs to obtain
analysis on detecting emotions from text, facial the best accuracy.
expressions, images, speeches, paintings, songs, etc.
Among all, voice recorded speeches and facial
expressions contain the most dominant clues and have 2 DATA AND PREPROCESSING
largely been studied (Carlos Busso, Zhigang Deng,
Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe In this research, 10,983 English tweets were used for
Kazemzadeh, Sungbok Lee, Ulrich Neumann, and multi-label emotion classification from (“SemEval-
Shrikanth Narayanan, 2004)( Alicja Wieczorkowska, 2018”, 2018), (Mohammed, S., M.; Bravo-Marquez,
Piotr Synak, and Zbigniew W. Ra´s., 2006). Some F.; Salameh, M.; Kiritchenko, S, 2018). The dataset
types of text can convey emotions such as personal of emotions classification includes the eight basic
notes, emails, blogs, novels, news headlines, and chat emotions (joy, sadness, anger, fear, trust, disgust,
messages. Specifically, popular social networking surprise, and anticipation) as per Plutchik (1980)
websites such as Facebook, Twitter, Myspace are (Jabreel M., Moreno A, 2019) emotion model, as well
appropriate places to share one’s feelings easily and as a few other emotions that are common in tweets
largely. which are love, optimism, and pessimism. Moreover,
python 3.7.4 version was used for data preprocessing,
1.1 Multi-label Classification for multi-label emotion classification, and data
Emotion Classification visualization.
Data preprocessing is the most crucial data mining
Emotion mining is a multi-label classification technique that transforms the raw data into a useful
problem that requires predicting several emotion and efficient format. Real-world information is
scores from a given sequence data. Any given frequently inconsistent, incomplete, or missing in
sequence data can possess more than one emotion, so specific behaviours and is likely to contain lots of
the problem can be posed as a multi-label errors. It is a demonstrated technique of resolving
classification problem rather than a multi-class such issues. It prepares raw data for further
classification problem. Both machine learning and processing. Different tools are available for data
deep learning were used in this research to solve the preprocessing. Data preprocessing is divided into a
problem. few stages which is show in Figure 1.
129
WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies
130
Multi-label Emotion Classification using Machine Learning and Deep Learning Methods
transformed into numbers by using the methods given similar to the available categories. It reserves all the
below. available data and classifies a new data point based
Bag of Words (BOW) on the similarity. This means when new data comes
Term frequency and Inverse document frequency out then it can be easily classified into a well suite
(TF-IDF) category by using K- NN algorithm. It can be used for
It is always a better idea to use TF-IDF rather than Classification as well as for Regression but mostly it
BOW as the TF-IDF feature engineering technique is used for the Classification problems. KNN
also preserves some semantic nature of the sequence. algorithm at the training phase just stores the dataset
For this research, the TF-IDF feature engineering and when it gets new data, then it classifies that data
technique was used to encode tokens as numbers. into a different category that is much similar to the
new data.
Naïve Bayes: Naive Bayes is a machine learning
classifier and it used to solve classification problems. 3.2 Deep Learning based Emotion
It uses Bayes theorem extensively for training. It can Classification
solve diagnostic and predictive problems. Bayesian
Classification provides a useful point of view for Deep learning adjusts a multilayer approach to the
evaluating and understanding many learning hidden layers of the neural network. In machine
algorithms. It calculates explicit probabilities for learning approaches, features are defined and
hypothesis, and it is robust to noise in input extracted either manually or by making use of feature
information (Hemalatha, Dr. G. P Saradhi Varma, Dr. selection methods. In any case, features are learned
A. Govardhan, 2013). In this multilabel classification, and extricated automatically in deep learning,
single Naive Bayes model is trained for predicting achieving better accuracy and performance. Figure 3
each output variable. shows the overview of deep learning technique. deep
learning currently provides the best solutions to many
Support Vector Machine: The support vector problems in the fields of image and speech
machine is a supervised learning distance-based recognition, as well as in NLP.
model. It is extensively used for classification and
regression. The main aim of SVM is to find an
optimal separating hyperplane that correctly
classifies data points and separates the points of two
classes as far as possible, by minimizing the risk of
misclassifying the unseen test samples and training
samples (García-Gonzalo, E., Fernández-Muñiz, Z.,
García Nieto, P.J., Bernardo Sánchez, A., Menéndez
Fernández, M, 2016). It means that two classes have
maximum distance from the separating hyperplane. Figure 3: Overview of applying deep learning techniques.
Random forest: It is an ensemble learning method
for classification and regression. Each tree is grown Feature Exrtraction: Feature extraction is the name
with a random parameter and the final output is for methods that combine and/or select variables
achieved by aggregating over the ensemble (R. Gajjar into features, effectively reducing the amount of data
and T. Zaveri, 2017). As the name suggests, It is a that must be processed, while still accurately and
classifier that contains a number of decision trees on completely describing the original data set.
different subsets of the given dataset and takes the
average to improve the predictive accuracy of that Word Embedding: Word embeddings are the texts
dataset. Rather than depending on one decision tree, changed into numbers and there may be different
the random forest takes the prediction from each tree numerical representations of the same content. As it
and based on the majority votes of predictions, and it turns out, most of the machine learning algorithms
predicts the final output. and deep learning architectures are unable to process
strings or plain text in their raw form (NSS, 2017).
K-Nearesr Neighbor: K-Nearest Neighbor is one of They require numbers as inputs to perform any sort of
the simplest Machine Learning algorithms based on work, which is classification, regression etc.
Supervised Learning technique. It assumes the Moreover, with the huge amount of data that is
similarity between the new data and available data present within the text format, it is basic to extract
and put the new data into the category that is the most knowledge out of it and build applications (NSS,
131
WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies
2017). So, word embeddings are used for converting to Naïve Bayes. For precision, machine learning
all text documents into a numeric format. methods achieved better result compared to deep
learning methods. For deep learning models, GRU
Word2vec: It could be a two-layer neural net that based RNN with RmsProp optimizer (0.59)
processes text (Pathmind Inc., 2022) . The text corpus performed well compare to Adam optimizer (0.52).
takes as an input, and its output may be a set of
vectors. Whereas it is not a deep neural network it
turns text into a numerical form that deep neural
network can process. The main purpose and
usefulness of Word2vec is to group the vectors of
similar words together in vector space (Pathmind Inc.,
2022).
132
Multi-label Emotion Classification using Machine Learning and Deep Learning Methods
shows F1 score of the classifiers for each emotion prediction. These ensemble methods considered in
category. this research are parallel in nature which means all the
models are independent of each other. Figure 7 shows
that both ensemble techniques achieved the best result
with respect to precision (0.818, 0.813), recall (0.829,
0.83) and F1 score (0.789, 0.799) for average of all
emotions respectively. Moreover, both the ensemble
techniques perform better than any individual
method. Figure 7 compares performance metrics of
ensemble methods against other individual
algorithms.
133
WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies
Overall, the better performance is achieved by significant performance differences (S. Rajaraman,
using machine learning methods for all evaluation Sameer K. Antani, 2020).
parameters. But GRU based RNN with Rmsprop Table 2 summarizes the ANOVA test results for
optimizer performed the best in terms of accuracy, performance metrics. It is observed that the P -values
with the highest accuracy (0.823) compared to other are lower than 0.05 for the performance metrics. This
classifiers. The results also show a huge improvement means that the methods are statistically significant
compared to the results of Mohammed et al. (Jabreel (null hypothesis H0 is rejected) when evaluated on the
M., Moreno A, 2019) for the same dataset. Figure 8 basis of these performance metrics. F1 score is the
shows that comparison of all evaluation parameters consonant mean of both precision and recall. It is a
using different classifiers. better measure of incorrectly classified cases and used
when it needs to maintain higher precision and recall
instead of just focussing on one. In this study, the
mean value of F1 score is higher for weighted average
ensemble method (0.802) compared to that of
majority voting ensemble method (0.789). This
shows, that weighted average method has proved to
be the best model in view of achieving higher F1
score and model built using weighted average method
would result in higher F1 score over other methods.
134
Multi-label Emotion Classification using Machine Learning and Deep Learning Methods
higher, recall increased from 0.56 to 0.82 using Neumann, and Shrikanth Narayanan. (2004). Analysis of
Random Forest classifier which is 26% (0.26) more emotion recognition using facial expressions, speech,
and F1 score increased from 0.56 to 0.798 using SVM and multimodal information, In Proceedings of the 6th
which is 23.8% (0.238) higher than Mohammed et al. International Conference on Multimodal Interfaces.
ACM, pp. 205–211
(Alicja Wieczorkowska, Piotr Synak, and Zbigniew Alicja Wieczorkowska, Piotr Synak, and Zbigniew W.
W. Ra´s., 2006) research paper results on emotion Ra´s. (2006). Multi-label classification of emotions in
classification dataset (SemEval-2018). Highest value music, In
of AUC (0.84) was achieved for GRU based RNN Intelligent Information Processing and Web Mining.
with RmsProp optimizer. For visualization, Springer, pp. 307–315
Matplotlib library was used in Jupyter Notebook to SemEval-2018 Task 1: Affect in Tweets (Emotion
compare all the results using machine learning and Classification Dataset):
deep learning methods. https://competitions.codalab.org/competitions/17751#learn
_the_details-datasets
Future Work: In the future, the present analysis can Mohammed, S., M.; Bravo-Marquez, F.; Salameh, M.;
Kiritchenko, S. (2018). Semeval-2018 task 1: Affect in
be extended by adding more feature extraction
Tweets, In
parameters and different models can be applied and Proceedings of the 12th InternationalWorkshop on
tested on different datasets. The present research Semantic Evaluation, New Orleans, LA, USA, pp. 1–
focusses on establishing the relations between the 17
tweet and emotion labels. More research can be done Jabreel M., Moreno A. (2019). A Deep Learning-Based
in the direction of exploring relations between the Approach for Multi-Label Emotion Classification in
phrases of tweet and emotion label. Transfer learning Tweets, Appl. Sci. 9:1123. doi: 10.3390/app9061123
with some existing pre-trained models for Manmohan singh. (2020). Stop the stopwords using
classification and data fusion from different data different python libraries,
https://medium.com/towards-artificial-
sources can be a good direction to explore to improve
intelligence/stop-the-stopwords-using-different-
the robustness and accuracy. In this study, dataset python-libraries-ffa6df941653
comes from only twitter source, but other social Hemalatha, Dr. G. P Saradhi Varma, Dr. A. Govardhan.
networks can be used for creating this type of dataset. (2013). Sentiment Analysis Tool using Machine
For this research, emotion classification dataset was Learning Algorithms,
used from the research paper of Mohammed et al., but IJETTCS, Vol 2, Issue 2
new dataset can be created to explore the same García-Gonzalo, E., Fernández-Muñiz, Z., García Nieto,
problem. P.J., Bernardo Sánchez, A., Menéndez Fernández, M.
(2016). Hard- Rock Stability Analysis for Span
Design in Entry-Type Excavations with Learning
Classifiers, 9, 531,
REFERENCES DOI: https://doi.org/10.3390/ma9070531
R. Gajjar and T. Zaveri. (2017). Defocus blur radius
Xiao Zhang, Wenzhong Li1, Sanglu Lu. (2017). Emotion classification using random forest classifier, 2017
detection in online social network based on multi-label International Conference on Innovations in
learning, Electronics, Signal Processing and Communication
Database Systems for Advanced Applications- 22nd (IESC), pp.219-223, DOI: https://doi.org/10.
International Conference, pp. 659-674 1109/IESPC.2017.8071896
Avetisyan, H and Bruna, Ondej and Holub, Jan. (2016). NSS (2017). “An intuitive understanding of Word
Overview of existing algorithms for emotion Embedding: From Count vectors to word2vec”,
classification https://www.analyticsvidhya.com/blog/2017/06/word-
Uncertainties in evaluations of accuracies, Journal of embeddings-count-word2veec
Physics: Conference Series, vol:772 “A Beginner’s guide to word2vec and neural word
Yadollahi, Ali and Shahraki, Ameneh Gholipour and embeddings”, https://wiki.pathmind.com/word2vec
Zaiane, Osmar R. (2017). Current State of Text S. Konstadinov. (2017). Understanding GRU networks,
Sentiment Analysis from Opinion to Emotion https://towardsdatascience.com/understanding-gru-
Mining, ACM. Survey , pp.1-25 networks- 2ef37df6c9be
Rangel and Paolo Rosso. (2016). On the impact of emotions S. Rajaraman, Sameer K. Antani. (2020). Modality-specific
on author profiling, Information Processing & deep learning model ensembles toward improving TB
Management 52, pp.73–92 detection in chest radiographs, IEEE access:
Carlos Busso, Zhigang Deng, Serdar Yildirim, Murtaza practical innovations, open solutions vol.8 :27318-
Bulut, Chul Min Lee, Abe Kazemzadeh, Sungbok Lee, 27326, DOI: 10.1109/access.2020.2971257.
Ulrich
135