Twitter Sentiment Analysis Using Hybrid Gated Attention Recurrent Network
Twitter Sentiment Analysis Using Hybrid Gated Attention Recurrent Network
*Correspondence:
nikhat0891@gmail.com Abstract
1
Department of Computer Sentiment analysis is the most trending and ongoing research in the field of data
Science & Engineering, mining. Nowadays, several social media platforms are developed, among that twit-
Koneru Lakshmaiah Education ter is a significant tool for sharing and acquiring peoples’ opinions, emotions, views,
Foundation, Guntur‑Dt,
Vaddeswaram, Andhra Pradesh, and attitudes towards particular entities. This made sentiment analysis a fascinating
India process in the natural language processing (NLP) domain. Different techniques are
2
Ton Duc Thang University, Ho developed for sentiment analysis, whereas there still exists a space for further enhance-
Chi Minh, Vietnam
3
ITM SLS Baroda University, ment in accuracy and system efficacy. An efficient and effective optimization based
Vadodara, Gujarat, India feature selection and deep learning based sentiment analysis is developed in the pro-
4
Data Science Laboratory, posed architecture to fulfil it. In this work, the sentiment 140 dataset is used for analys-
Faculty of Information
Technology, Industrial University ing the performance of proposed gated attention recurrent network (GARN) architec-
of Ho Chi Minh, Ho Chi Minh, ture. Initially, the available dataset is pre-processed to clean and filter out the dataset.
Vietnam Then, a term weight-based feature extraction termed Log Term Frequency-based
5
Department of Computer
Science & Engineering, St.Peter’s Modified Inverse Class Frequency (LTF-MICF) model is used to extract the sentiment-
Engineering College, Hyderabad, based features from the pre-processed data. In the third phase, a hybrid mutation-
India based white shark optimizer (HMWSO) is introduced for feature selection. Using
the selected features, the sentiment classes, such as positive, negative, and neutral,
are classified using the GARN architecture, which combines recurrent neural networks
(RNN) and attention mechanisms. Finally, the performance analysis between the pro-
posed and existing classifiers is performed. The evaluated performance metrics
and the gained value for such metrics using the proposed GARN are accuracy 97.86%,
precision 96.65%, recall 96.76% and f-measure 96.70%, respectively.
Keywords: Deep learning, Term weight-feature extraction, White shark optimizer,
Twitter sentiment, Gated recurrent attention network, Natural language processing,
Recurrent neural network
Introduction
Sentiment Analysis (SA) uses text analysis, NLP (Natural Language Processing), and sta-
tistics to evaluate the user’s sentiments. SA is also called emotion AI or opinion min-
ing [1]. The term ‘sentiment’ refers to feelings, thoughts, or attitudesexpressed about a
person, situation, or thing. SA is one of the NLP techniques used to identify whether the
obtained data or information is positive, neutral or negative. Business experts frequently
use it to monitor or detect sentiments to gauge brand reputation, social data and under-
stand customer needs [2, 3]. Over recent years, the amount of information uploaded or
© The Author(s) 2023, corrected publication 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 Interna-
tional License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appro-
priate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in
a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of
this licence, visit http://creativecommons.org/licenses/by/4.0/.
Parveen et al. Journal of Big Data (2023) 10:50 Page 2 of 29
generated online has rapidly increased due to the enormous number of Internet users [4,
5].
Globally, with the emergence of technology, social media sites [6, 7] such as Twitter,
Instagram, Facebook, LinkedIn, YouTube etc.,have been used by people to express their
views or opinions about products, events or targets. Nowadays, Twitter is the global
micro-blogging platform greatly preferred by users to share their opinions in the form
of short messages called tweets [8]. Twitterholds 152 M (million) daily active users and
330 M monthly active users,with 500 M tweets sent daily [9]. Tweets often effectively
createa vast quantity of sentiment data based on analysis. Twitter is an effective OSN
(online social network) for disseminating information and user interactions. Twitter sen-
timents significantly influence diverse aspects of our lives [10]. SA and text classification
aims at textual information extraction and further categorizes the polarity as positive
(P), negative (N) or neutral (Ne).
NLP techniques are often used to retrieve information from text or tweet content.
NLP-based sentiment classification is the procedure in which the machine (computer)
extracts the meaning of each sentence generated by a human. Manual analysis of TSA
(Twitter Sentiment Analysis) is time-consuming and requires more experts for tweet
labelling. Hence, to overcome these challenges automated model is developed. The inno-
vations of ML (Machine learning) algorithms [11, 12],such as SVM (Support Vector
Machine), MNB (Multinomial Naïve Bayes), LR (Logistic Regression), NB (Naïve Bayes)
etc., have been used in the analysis of online sentiments. However, these methods illus-
trated good performance, but these approaches are very slow and need more time to
perform the training process.
DL model is introduced to classify Twitter sentiments effectively. DL is the subset of
ML that utilizes multiple algorithms to solve complicated problems. DL uses a chain
of progressive events and permits the machine to deal with vast data and little human
interaction. DL-based sentiment analysis offers accurate results and can be applied to
various applications such as movie recommendations, product predictions, emotion rec-
ognition [13–15],etc. Such innovations have motivated several researchers to introduce
DL in Twitter sentiment analysis.
Motivation
SA (Sentiment Analysis) is deliberated with recognizing and classifying the polarity
or opinions of the text data. Nowadays, people widely share their opinions and senti-
ments on social sites. Thus, a massive amount of data is generated online, and effectively
mining the online data is essential for retrieving quality information. Analyzing online
sentiments can createa combined opinion on certain products. Moreover, TSA (Twit-
ter Sentiment Analysis) is challenging for multiple reasons. Short texts (tweets), owing
to the maximum character limit, is a major issue. The presence of misspellings, slang
and emoticons in the tweets requires an additional pre-processing step for filtering the
raw data. Also, selecting a new feature extraction model would be challenging,further
impacting sentiment classification. Therefore, this work aims to develop a new feature
extraction and selection approach integrated with a hybrid DL classification model for
Parveen et al. Journal of Big Data (2023) 10:50 Page 3 of 29
accurate tweet sentiment classification. The existing research works [16–21] focus on
DL-based TSA, which haven’t attained significant results because of smaller dataset
usage and slower manual text labelling. However, the datasets with unwanted details and
spaces also reduce the classification algorithm’s efficiency. Further, the dimension occu-
pied by extracted features also degrades the efficiency of a DL approach. Hence, to over-
come such issues, this work aims to develop a successful DL algorithm for performing
Twitter SA. Pre-processing is a major contributor to this architecture as it can enhance
DL efficiency by removing unwanted details from the dataset. This pre-processing also
reduces the processing time of a feature extraction algorithm. Followed to that, an opti-
mization-based feature selection process was introduced, which reduces the effort of
analyzing irrelevant features. However, unlike existing algorithms, the proposed GARN
can efficiently analyse the text-based features. Further, combining the attention mech-
anism with DL has enhanced the overall efficiency of the proposed DL algorithm. As
attention mechanism have the greater ability to learn the selected features by reducing
the complexity of model. This merit causes the attention mechanism to integrate with
RNN and achieved effective performance.
Objectives
The major objectives of the proposed research are:
Related works
Some of the works related to DL‑based Twitter sentiment analysis are:
Alharbi et al. [16] presented the analysis of Twitter sentiments using a DNN (deep neu-
ral network) based approach called CNN (Convolutional Neural Network). The classi-
fication of tweets was processed based on dual aspects, such as using social activities
and personality traits. The sentiment (P, N or Ne) analysis was demonstrated with the
CNN model, where the input layer involves the feature lists and the pre-trained word
embedding (Word2Vec). The dual datasets used for processing were SemEval-2016_1
and SemEval-2016_2. The accuracy obtained by CNN was 88.46%, whereas the exist-
ing methods achieved less accuracy than CNN. The accuracy of existing methods is
LSTM (86.48%), SVM (86.75%), KNN (k-nearest neighbour) (82.83%), and J48 (85.44%),
respectively.
Parveen et al. Journal of Big Data (2023) 10:50 Page 4 of 29
Tam et al. [17] developed a Convolutional Bi-LSTM model based on sentiment clas-
sification on Twitter data. Here, the integration of CNN-Bi-LSTM was characterized
byextracting local high-level features. The input layer gets the text input and slices it into
tokens. Each token was transformed into NV (numeric values). Next, the pre-trained
WE (word embedding), such as GloVe and W2V (word2vector), were used to create the
word vector matrix. The important words were extracted using the CNN model,and the
feature set was further minimized using the max-pooling layer. The Bi-LSTM (back-
wards, forward) layers were utilized to learn the textual context. The dense layer (DeL)
was included after the Bi-LSTM layer to interconnect the input data with output using
weights. The performance was experimented using datasets TLSA (Twitter Label SA)
and SST-2 (Stanford Sentiment Treebank). The accuracy with the TLSA dataset was
(94.13%) and (91.13%) with the SST-2 dataset.
Chugh et al. [18] developed an improved DL model for information retrieval and clas-
sification of sentiments. The hybridized optimization algorithm SMCA was the integra-
tion of SMO (Spider Monkey Optimization) and CSA (Crow Search Algorithm). The
presented DRNN (DeepRNN) was trained using the algorithm named SMCA. Here, the
sentiment categorization was processed with DeepRNN-SMCA and the information
retrieval was done with FuzzyKNN. The datasets used were the mobile reviews amazon
dataset and telecom tweets dataset. Forsentiment classification, the accuracy obtained
on the first dataset was (0.967), andthe latter was gained (0.943). The performance with
IR (information retrieval) on dataset 1 gained (0.831) accuracy and dataset 2 obtained
(0.883) accuracy.
Alamoudi et al. [19] performed aspect-based SA and sentiment classification aboutWE
(word embeddings) and DL. The sentiment categorization involves both ternary and
binary classes. Initially, the YELP review dataset was prepared and pre-processed for
classification. The feature extraction was modelled with TF-IDF, BoW and Glove WE.
Initially, the NB and LR were used for first set feature (TF-IDF, BoW features) modelling;
then, the Glove features were modelled using diverse models such as ALBERT, CNN,
and BERT for the ternary classification. Next, aspect and sentence-based binary SA was
executed. The WE vector for sentence and aspect was done with the Glove approach.
The similarity among aspects and sentence vectors was measured using cosine similarity,
and binary aspects were classified. The highest accuracy (98.308%) was obtained when
executed with the ALBERT model on aYELP 2-class dataset, whereas the BERT model
gained (89.626%) accuracy with a YELP 3-class dataset.
Tan et al. [20] introduced a hybrid robustly optimized BERT approach (RoBERTa) with
LSTM for analyzing the sentiment data with transformer and RNN. The textual data was
processed with word embedding, and tokenization of the subwordwas characterized
with the RoBERTa model. The long-distance Tm (temporal) dependencies were encoded
using the LSTM model. The DA (data augmentation) based on pre-trained word embed-
ding was developed to synthesize multiple lexical samples and present the minority
class-based oversampling. Processing of DA solves the problem of an imbalanced data-
set with greater lexical training samples. The Adam optimization algorithm was used to
Parveen et al. Journal of Big Data (2023) 10:50 Page 5 of 29
Alharbi et al. 2019 [16] CNN (Convolutional Neural The behavioural informa- Difficult to interpret the
Network) tion of the user is included exact tweet from a group
of tweets
Tam et al. 2021 [17] Hybrid CNN-BiLSTM (Con- The performance of the Lower classification and
volution neural network word embedding tech- retrieval accuracy
and bidirectional long niques is high
short-term memory)
Chugh et al. 2021 [18] DRNN (DeepRNN), SMO Provides better reviews to Lower performance
(Spider Monkey Optimiza- take effective decisions accuracy
tion) and CSA (Crow Search
Algorithm)
Alamoudi et al. 2021 [19] Convolutional neural Reduction in error rate Occurrence of mislabelled
network (CNN), BERT and reviews
ALBERT models
Tan et al. 2022 [20] BERT approach (RoBERTa) Optimization is done using Lower classification
with LSTM the word embedding accuracy
technique
Hasib et al. 2021 [21] CNN (Convolutional neural Collected data on the Less number of tweets
network) and DNN emotions of the airline are used
consumers
Guedes, G.P. 2020 [24] UWS, SWS Efficiency of proposed over High error obtained
the used dataset is found
high
feature extraction, TF-IDF vector is utilized and Binary Brain Storm Optimization
(BBSO) is used to select the relevant features from the group of vectors. Finally, the
incidence of both positive and negative sentiments is classified using Fuzzy Cognitive
Maps (FCMs). Table 1 shows the comparative analysis of Twitter sentiment analysis
using DL techniques.
Problem statement
There are many problems related to twitter sentiment analysis using DL techniques.
The author in [16] has used the DL model and performed the sentiment classification
from Twitter data. To classify such data, this method analysed each user’s behavioural
information. However, this method has faced struggles in interpreting exact tweet words
from the massive tweet corpus; due to this, the efficiency of a classification algorithm
has been reduced.ConvBiLSTM was introduced in [17], which used glove and word2vec-
based features for sentiment classification. However, the extracted features are not suffi-
cient to achieve satisfactory accuracy. Then, processing time reduction was considered a
major objective in [18], which utilizes DeepRNN for sentiment classification. But it fails
to reduce the dimension occupied by the extracted features. This makes several valuable
featuresfall within the local optimum. DL and word embedding processes were com-
bined in [19], which utilizes Yelp reviews for processing. It has shown efficient perfor-
mance for two classes but fails to provide better accuracy for three-class classification.
Recently, a hybrid LSTM architecture was developed in [20], which has shown flexible
processing over sentiment classification and takes a huge amount of time to process
large datasets. DNN-based feature extraction and CNN-based sentiment classification
were performed in [21], which haven’t shown more efficient performance than other
algorithms. Further, it also concentrated only on 2 classes.
Parveen et al. Journal of Big Data (2023) 10:50 Page 7 of 29
Few of the existing literatures fails to achieve efficient processing time, complexity and
accuracy due to the availability of large dataset. Further, the extraction of low-level and
unwanted features reduces the efficiency of classifier. Further, the usage of all extracted
features occupies large dimension. These demerits makes the existing algorithms not
suitable for efficient processing. This shortcomings open a research space for efficient
combined algorithm for twitter data analysis. To overcome such issue, the proposed
architecture has combined RNN and attention mechanism. The features required for
classification is extracted using LTF-MICF which provides features for twitter pro-
cessing. Then, the dimension occupied by huge extracted features are reduced using
HMWSO algorithm. This algorithm has the ability to process the features in less time
complexity and shows better optimal feature selection process. This selected features has
enhanced the performance of proposed classifier over the large dataset and also achieved
efficient accuracy with less misclassification error rate.
Proposed methodology
For sentiment classification of Twitter tweets, a DL technique of gated attention recur-
rent network (GARN) is proposed. The Twitter dataset (Sentiment140 dataset) with sen-
timent tweets that the public can access is initially collected and given as input. After
collecting data, the next stage is pre-processing the tweets. In the pre-processing stage,
tokenization, stopwords removal, stemming, slang and acronym correction, removal of
numbers, punctuations &symbol removal, removal of uppercase and replacing with low-
ercase, character &URL, hashtag & user mention removal are done. Now the pre-pro-
cessed dataset act as input for the next process. Based on term frequency, a term weight
is allocated for each term in the dataset using the Log Term Frequency-based Modi-
fied Inverse Class Frequency (LTF-MICF) extraction technique. Next, Hybrid Mutation
based White Shark Optimizer (HMWSO) is used to select optimal term weight. Finally,
the output of HMWSO is fed into the gated attention recurrent network (GARN) for
sentiment classification with three different classes. Figure 1 shows a diagrammatic rep-
resentation of the proposed methodology.
Tweets pre‑processing
Pre-processing is converting the long data into short text to perform other processes
such as classification, detecting unwanted news, sentiment analysis etc., as Twitter users
use different styles to post their tweets. Some may post the tweet in abbreviations, sym-
bols, URLs, hashtags, and punctuations. Also, tweets may consist of emojis, emoticons,
or stickers to express the user’s sentiments and feelings. Sometimes the tweets may be
in a hybrid form,such as adding abbreviations, symbols and URLs. So these kinds of
symbols, abbreviations, and punctuations should be removed from the tweet toclassify
the dataset further. The features to be removed from the tweet dataset are tokenization,
stopwords removal, stemming, slag and acronym correction, removal of numbers, punc-
tuation and symbol removal, noise removal, URL, hashtags, replacing long characters,
upper case to lower case, and lemmatization.
Tokenization
Tokenization [28] is splitting a text cluster into small words, symbols, phrases and other
meaningful forms known as tokens. These tokens are considered as input for further pro-
cessing. Another important use of tokenization is that it can identify meaningful words.
The tokenization challenge depends only on the type of language used. For example, in
languages such as English and French, some words may be separated by white spaces.
Other languages, such as Chinese and Thai words,are not separated. The tokenization
process is carried out in the NLTK Python library. In this phase, the data is processed in
three forms: convert the text document into word counts. Secondly,data cleansing and
filtering occur, andfinally, the document is split into tokens or words.
The example provided below illustrates the original tweet before and after performing
tokenization:
Before tokenization
DLis a technology which trains the machineto behave naturally like a human being.
After tokenization
Deep, learning, is, a, technology, which, train, the, machine, to, behave, naturally, like, a,
human, being.
Numerous tools are available to tokenize a text document. Some of them are as
follows;
Stopwords removal
Stopword removal [28] is a process of removing frequently used words with meaningless
in a text document. Stopwords such as are, this, that, and, so are frequently occurring
words in a sentence. These words are also termed pronouns, articles and prepositions.
Such words are not used forfurther processing, so removing those words is required.
If such words are not removed, the sentence seems heavy and becomes less important
for the analyst.Also, they are not considered keywords in Twitter analysis applications.
Many methods exist to remove stopwords from a document; they are.
• Z-methods
• Classic method
• Mutual information (MI) method
• Term based random sampling (TBRS) method
Ql (t)
dl (t) = Ql (t). log2 (1)
Q(t)
where Ql (t) is the normalized term frequency (NTF) of the term t within a mass l , and
NTF is denoted as Q(t) of term t in the entire document. Finally, using this equation, the
least terms are considered a stopword list from which the duplications are removed.
Stemming
Removing prefixesand suffixes from a word is performed using the stemming method.
It can also be defined as detecting the root and stem of a word and removing them. For
example, processed word processing can be stemmed from a single word as a process
[28]. The two points to be considered while performing stemming are: the words with
different meanings must be kept separate, and the words of morphological forms will
contain the same meaning and must be mapped with a similar stem. There are stemming
algorithms to classify the words. The algorithms are divided into three methods: trun-
cating, statistical, and mixed methods. Truncating method is the process of removing
a suffix from a plural word. Some rules must be carried out to remove suffixes from the
plurals to convert the plural word into the singular form.
Different stemmer algorithms are used under the truncating method. Some algorithms
are Lovins stemmer, porters stemmer, paice and husk stemmer, and Dawson stemmer.
Parveen et al. Journal of Big Data (2023) 10:50 Page 10 of 29
Lovins stemmer algorithm is used to remove the lengthy suffix from a word. The draw-
back of using this stemmer is that it consumes more time to process. Porter’s stemmer
algorithm removes suffixes from a word by applying many rules. If the applied rule is
satisfied, the suffix is automatically removed. The algorithm consists of 60 rules and is
faster than theLovins algorithm. Paice and husk is an iterative algorithm that consists
of 120 rules to remove the last character of the suffixed word. This algorithm performs
two operations, namely, deletion and replacement. The Dawson algorithm keeps the suf-
fixed words in reverse order by predicting their length and the last character. In statisti-
cal methods, some algorithms are used: N-gram stemmer, HMM stemmer, and YASS
stemmer. In a mixed process, the inflectional and derivational methods are used.
Removal of numbers
Removal of numbers in the Twitter dataset is a process of deleting the occurrence of
numbers between any words in a sentence [29].
An example illustrates the original tweet before and after removing numbers.
Before removal: My ink “My Way…No Regrets” Always Make Happiness Your #1
Priority.
After removal: My ink “My Way … No Regrets” Always Make Happiness Your #
Priority.
Once removed, the tweet will no longer contain any numbers.
Before removal: My ink “My Way…No Regrets” Always Make Happiness Your #1
Priority.
After removal: My ink My Way No Regrets Always Make Happiness Your Priority.
After removal, the tweet will not contain any punctuation. Symbol removal is the pro-
cess of removing all the symbols from the tweet.
An example illustrates the original tweet before and after removing symbols.
Before removal: wednesday addams as a disney princess keeping it
.
After removal: wednesday addams as a disney princess keeping it.
After removal, there would not be any symbols in the tweet.
heavyweight in the document. So, the proposed hybrid feature extraction technique can
overcome this issue. Therefore, f T is integrated with MICF, an effective Tw approach.
Inverse class frequency f Ci is the inverse ratio of the total class of terms that occurs on
training tweets to the total classes. The algorithm for the TW-FE technique is shown in
algorithm 1 [22].
Parveen et al. Journal of Big Data (2023) 10:50 Page 12 of 29
Two steps are involved in calculating LTF l Tf . The first step is to calculate the f T of each
term in the pre-processed dataset. The second step is, applying log normalization to the
output of the computed f T data. The modified version of f Ci, the MICF is calculated for
each term in the document. MICF is said to be executed then;each term in the document
should have different class-particular ranks, which should possess differing contributions to
the total term rank. It is necessary to assign dissimilar weights for dissimilar class-specific
ranks. Consequently, the sum of the weights of all class-specific ranks is employed as the
total term rank. The proposed formula for Tw using LTF-based MICF is represented as fol-
lows [22];
m
LTF − MICF (tp) =l Tf (tp) ∗ wsp − f
Ci (tp) (2)
r=1
where a specific weighting factor is denoted wsp for each tp for class Cr , which can be
clearly represented as;
⇀
si t si t̃
wsp = log(1 + ←−. (3)
max(1, si t max(1, si ⌢t
The method used to assign a weight for a given dataset is known as the weighting factor
(WF). Where the number of tweets si in class Cr which contains pre-processed terms tp
⇀ ←
is denoted as si t . The number of si in other classes, which contains tp is denoted as si t .
⌢
The number of si in-class Cr , which do not possess,tp is denoted as si t . The number of si
in other classes, which do not possess,tp is denoted as si t̃ . To eliminate negative weights,
the constant ‘1’ is used. In extreme cases, to avoid a zero-denominator issue, the minimal
← ⌢
denominator is set to ‘1’ if si t = 0 or si t = 0. The formula for l Tf (tp) and f Ci (tp) can be
presented as follows [22];
l
Tf (tp) = log(1 +f T (tp, si )) (4)
where raw count of tp on si is denoted as f T (tp, si ), i.e., the total times of tp occurs on si.
Parveen et al. Journal of Big Data (2023) 10:50 Page 13 of 29
f r
Ci (tp) = log(1 + (5)
C(tp)
where r refers to the total number of classes in si , and C(tp) is the total number of classes
in tp. The dataset features are represented as fj = f1 , f2 , ..........f3 , ......fm after Tw , where
the number of weighted terms in the pre-processed dataset is denoted as f1 , f2 , ...f3 , ...fm
respectively. The computed rank values of each term in the text document of tweets are
used for performing the further process.
Feature selection
The existence of irrelevant features in the data can reduce the accuracy level of the classifi-
cation process and make the model to learn those irrelevant features. This issue is termed as
the optimization issue. This issue can be ignored only by taking optimal solutions from the
processed dataset. Therefore, a feature selection algorithm named White shark optimizer
with a hybrid mutation strategy is utilized to achieve a feature selection process.
where s = 1, 2, ....m is the index of a white shark with a population size of m. The new
p
velocity of pth shark is denoted as vls+1 in (s + 1)th step. The initial speed of the pth shark
p
in the sth step is denoted as vls . The global best position achieved by any pth shark in sth
step is denoted as Wgbests . The initial position of the pth shark in sth step is denoted as
p
Ws . The best position of the pth shark and the index vector on attainingthe best position
p
vls
are denoted as Wbest and vci . Where C1 and C2 in the equation is defined as the creation
of uniform random numbers of the interval [1, 0]. F1 and F2 are the force of the shark to
p
vls p
control the effect of Wgbests and Wbest on Ws .µ represents to analyze the convergence fac-
tor of the shark. The index vector of the white shark is represented as;
where rand(1, t) is a random numbers vector obtained with a uniform distribution in the
interval [0, 1].The forces of the shark to control the effect are represented as follows;
2
F1 = Fmax + (Fmax − Fmin ) × e−(4u/U ) (9)
2
F2 = Fmin + (Fmax − Fmin ) × e−(4u/U ) (10)
The initial and maximum sum of the iteration is denoted as u and U , whereas the white
shark’s current and sub-ordinate velocities are denoted as Fmin and Fmax. The convergence
factor is represented as;
2
µ = √ (11)
2 − τ − τ 2 − 4τ
where τ is defined as the acceleration coefficient. The strategy for updating the position
of the white shark is represented as follows;
p
p Ws .¬ ⊕ W◦ +up . c+lo . d; rand<MV
Ws+1 = p p
Ws +vls /fr; rand ≥MV
(12)
The new position of the pth shark in (s + 1) iteration, ¬ represent the negation operator, c
and d represents the binary vectors. The search space lower and upper bounds are denoted
as lo and ub. W0 and fr denotes the logical vector and frequency at which the shark moves.
The binary and logic vectors are expressed as follows;
Wo = ⊕(c, d) (15)
frmax − frmin
fr = frmin + (16)
frmax − frmin
frmax and frmin represents the maximum and minimum frequency rates. The increase in
force at each iteration is represented as;
1
MV = (17)
(c0 + e(s/2−S)/c1 )
where the position updation following the food source of pth the white shark is denoted
′p
as Ws+1. The sgn(r2 − 0.5) produce 1 or −1 to modify the search direction. The food
Parveen et al. Journal of Big Data (2023) 10:50 Page 15 of 29
source and shark distance Dis w and the strength of the white shark following other
sharks close to the food source Strsns is formulated as follows;
Strsns = 1 − e(c2 ×s/S) (20)
The initial best optimal solutions are kept constant, and the position of other sharks
is updated according to these two constant optimal solutions. The fish school behav-
iour of the sharks is formulated as follows;
p ′p
Ws + Ws+1
P
WS+1 = (21)
2 × rand
where q fit is defined as the fitness of each term in the text document. The expansion of
the equation is represented as;
1
fit +2 fit + ....... +q+1 fit + ..... +m−1 fit +m fit
p 1
we = ∗ (23)
m − 1 1 fit +2 fit + .....q−1 fit +q fit +q+1 fit + .... +m−1 fit +m fit
The concatenation of hybrid mutation HM is applied to the WSO for a faster con-
vergence process. Thus, the hybrid mutation applied with the optimizer is repre-
sented as;
t+1
GM = Wqnew + D1 .Ga (µ, σ ) (25)
t+1
CM = Wqnew + D2 .Ca (µ′ , σ ′ ) (26)
whereas Ga (µ, σ ) and Ca (µ, σ ) represents an arbitrary number of both Gaussian and
Cauchy distribution. (µ, σ ) and (µ′ , σ ′ ) represents the mean and variance function of
both Gaussian and Cauchy distributions. D1 and D2 represents the coefficients of Gauss-
ian t+1 GM along with Cauchy t+1 CM mutation. On applying these two hybrid mutation
operators, a new solution is produced that is represented as;
where,
Wqnew
(28)
PS
pwe = y=1
PS
p
whereas we represents the weight vector and PS represents the size of the population.
The selected features from the extracted features are represented as Sel(p = 1, 2, ...m).
The WSO output is denoted as (sel) = sel 1 , sel 2 , .....sel m },which is a new sub-group of
terms in the dataset. At the same time,m denotes a new number of each identical feature.
Finally, the feature selection stage provides a dataset document with optimal features.
←
− ←
−
H t1 = σ we←− yt1 + we←
−←− H t1 −1 + c←
− (30)
yH H H H
� t1 ⊕ ←
H t1 = H
−
H t1 (31)
where the weight factor for two connecting layers is denoted as we , c is the bias vector,
σ represents the activation function, positive and negative outputs of GRU is denoted as
t1 and ←
H
−
H t1, ⊕ is a bitwise operator.
Attention mechanism
In sentiment analysis, the attention module is very important to denote the correlation
between the terms in a sentence and the output [26]. For direct simplification, an attention
model is used in this proposal named as feed-forward attention model. This simplification
is to produce a single vector ν from the total sequence represented as;
exp(Et1 )
βt1 = R (33)
s=1 exp(Es )
R
ν= βt1 Ht1 (34)
t1 =1
Where β is a learning function and is identified using Ht1. From the above Eq. 34, the
attention mechanism produces a fixed length for the embedding layer in a BGRU model
for every single vector ν by measuring the average weight of the data sequence H . The
structure for attention mechanism is shown in Fig. 3. Therefore, the final sub-set for the
classification is obtained from:
H # = tanh(ν) (35)
Sentiment classification
Twitter sentiment analysis is formally a classification problem. The proposed approach
classifies the sentiment data into three classes: positive, negative and neutral. For clas-
sification, the softmax classifier is used to classify the output in the hidden layer H # is
represented as;
where we is the weight factor, c is a bias vector and H # is the output of the last hidden
layer. Also, the cross-entropy is evaluated as a loss function represented as;
n
1
lossfc = − senj log xj + ||θ ||2 (38)
n
j=1
The total number of samples is denoted as, n. The real category of the sentence is
denoted as senj,the sentence with the predictive category is denoted as xj , and the L2
regular item is denoted as ||θ ||2.
Dataset
The dataset utilized in our proposed work is Sentiment 140, gathered from [27], which
contains 1,600,000tweets extracted from Twitter API. The score values for each tweet as,
for positive tweets, the rank value is 4.Similarly, for negative tweets rank value is 0, and
for neutral tweets, the rank value is 2.The total number of positive tweets in a dataset
is 20832, neutral tweets are 18318, negative tweets are 22542, and irrelevant tweets are
12990. From the entire dataset, 70%is used for training, 15% for testing and 15% for vali-
dation. Table 2 shows the system configuration of the designed classifier.
Performance metrics
In this proposed method, 4 different weight schemes are compared with other
existing,proposed classifiers in which the performance metrics are precision, f1-score,
recall and accuracy. Four notations, namely, true-positive (tp ), true-negative (tn ),
Parveen et al. Journal of Big Data (2023) 10:50 Page 19 of 29
false-positive (fp ) and false-negative, (fn ) are particularly utilized to measure the perfor-
mance metrics.
Accuracy (Ac )
Accuracy is the dataset’s information accurately being classified by the proposed classi-
fier. The accuracy value for the proposed method is obtained using Eq. 39.
tp + tn
Ac = (39)
tp + tn + fp + fn
Precision (Pr )
Precision is defined as the number of terms accurately identified positive to the total
identified positively. The precision value for the proposed method is obtained using
Eq. 40.
tp
Pr = (40)
tp + fp
Recall (Re )
The recall is defined as the percentage of accurately identified positive observations
to the total observations in the dataset. The recall value for the proposed method is
obtained using Eq. 41.
tp
Re = (41)
tp + fn
F1‑score (Fs )
F1-score is defined as the average weight of recall and precision. The f1-score value for
the proposed method is obtained using Eq. 42.
Pr . Re
Fs = 2 (42)
Pr + Re
Parveen et al. Journal of Big Data (2023) 10:50 Page 20 of 29
94.12%, 93.76% and 93.59%. But, when Bi-GRU is compared with the LTF-MICF term
weight scheme, the precision level is increased by 95.22%. The precision achieved by
the suggested method GARN with TF-DFS, TF-IDF, TF and W2V is 96.03%, 95.67%,
94.90% and 93.90%. Whereas, when the GARN classifier is compared with the sug-
gested term weighting scheme LTF-MICF the precision achieved is 96.65%, which is
Parveen et al. Journal of Big Data (2023) 10:50 Page 22 of 29
Table 4 Accuracy of the proposed and existing classifiers with term weights
Classifiers Term weighting-based feature extraction schemes
LTF-MICF % TF-DFS TF-IDF TF W2V
% % % %
Table 5 Precision of the proposed and existing classifiers with term weights
Classifiers Term weighting-based feature extraction schemes
LTF-MICF % TF-DFS TF-IDF TF W2V
% % % %
considered the best classifier with the best term weighting scheme. Figure 5 shows
that the GARN classifier with the LTF-MICF term weighting scheme achieved the
highest precision level compared with other classifiers and term weighting schemes.
Table 5 indicates the precision performance analysis for existing and proposed classi-
fiers with term weight schemes.
The analysis graph of Fig. 6 shows the f-measure of the four prevalent classifiers and
suggested classifiers with different term weight schemes. The f-measure of all the preva-
lent classifier with other term weight schemes are minimum compared to the suggested
term weighting scheme. In Bi-LSTM, the f-measure gained with TF-DFS, TF-IDF,
TF and W2V is93.34%, 92.77%, 92.28% and 91.89%. Compared with LTF-MICF, the
Parveen et al. Journal of Big Data (2023) 10:50 Page 23 of 29
Table 6 F-measure of the proposed and existing classifiers with term weights
Classifiers Term weighting-based feature extraction schemes
LTF-MICF % TF-DFS TF-IDF TF W2V
% % % %
f-measure level is improved by 95.22%. The f-measure derived by the advance GARN
with TF-DFS, TF-IDF, TF and W2V is 96.10%, 95.65%, 94.90% and 94.00%. When GARN
is compared with the advanced LTF-MICF scheme, the f-measure grows by 96.70%,
which is considered the leading classifier with the supreme term weighting scheme.
Therefore, from Fig. 6, the GARN model with the LTF-MICF scheme achieved the great-
est f-measure level compared with other DL models and term weighting schemes.Table 6
indicates the performance analysis of the f-measure for both prevalent and suggested
classifiers with term weight schemes.
Figure 7 illustrates the recall of the four previously discovered DL models andthe rec-
ommended model of dissimilar term weight schemes. The recall of the previously dis-
covered classifier with other term weight schemes is reduced compared to the novel
term weighting scheme. In RNN, the recall procured with TF-DFS, TF-IDF, TF and W2V
is 91.83%, 90.65%, 90.36% and 89.04%. In comparison with LTF-MICF, the recall value is
raised by 92.25%. The recall acquired by the invented GARN with TF-DFS, TF-IDF, TF
and W2V is 96.23%, 95.77%, 94.09% and 94.34%. Comparing GARN with the advanced
LTF-MICF scheme maximizes recall by 96.76%,which is appraised as the prime classi-
fier with an eminent term weighting scheme. Therefore, from Fig. 7, the GARN model
with the LTF-MICF scheme securedextraordinaryrecallvalue when differentiated from
other DL models and term weighting schemes. Table 7 indicates the recall performance
analysis for the previously discovered and recommended classifiers with term weight
schemes.
Parveen et al. Journal of Big Data (2023) 10:50 Page 24 of 29
Table 7 Recall of the proposed and existing classifiers with term weights
Classifiers Term weighting-based feature extraction schemes
LTF-MICF % TF-DFS TF-IDF TF W2V
% % % %
Discussion
The four stages employed to implement this proposed work are Twitter data collection,
tweet pre-processing, term weighting-based feature extraction, feature selection and
classification of sentiments present in the tweet. Initially, the considered tweet senti-
ment dataset is subjected to pre-processing.Here, tokenization, stemming, punctua-
tions, symbols, numbers, hashtags, and acronyms are removed. After removal, a clean
pre-processed dataset is obtained. The performance achieved by proposed and existing
methods for solving proposed objective is discussed in Table 8.
Using this pre-processed dataset, a term weighting-based feature extraction is done
using an integrated terms weight scheme such as LTF and MICF as a novel term
weighting scheme technique named LTF-MICF technique. An optimization algorithm,
HMWSO, with two hybrid mutation techniques, namely Cauchy and Gaussian muta-
tion, is chosen for feature selection. Finally, the GARN classifier is used for the classi-
fication of Twitter sentiments. The sentiments are classified as positive, negative and
neutral. The performance of existing classifiers with term weighting schemes and the
proposed classifier with term weighting schemes are analyzed. The performance com-
parison between the proposed and existing methods is shown in Table 9. The existing
details are collected from previous works developed for sentiment analysis from theT-
witter dataset.
Parveen et al. Journal of Big Data (2023) 10:50 Page 25 of 29
Table 8 Performance between proposed and existing methods for developed objective
Models Accuracy Precision Sensitivity F1-measure
Many DL techniques use only a single feature extraction technique, namely term fre-
quency (TF) and distinguishing feature selector (DFS), which will not accurately extract
the features. The proposed methods without optimization can diminish the proposed
model’s accuracy level. The feature extraction technique used in our proposed work will
perform greatly because it can extract features from frequently occurring terms in the
document. The proposed work uses an optimization algorithm to increase the accuracy
level of the designed model.The achieved results are shown in Fig. 8.
The accuracy comparison by varying the total selected features is described in Fig. 9
(a). The ROC curve of proposed model is discussed in Fig. 9 (b). The ROC is evaluated
using FPR (False positive rate), and TPR (True positive rate). The AUC (Area under
curve) obtained for proposed is found to be 0.989. It illustrates that the proposed model
has shown efficient accuracy with less error rate.
Parveen et al. Journal of Big Data (2023) 10:50 Page 26 of 29
Ablation study
The ablation study for the proposed model is discussed in Table 10. In this the perfor-
mance of overall architecture is described, further the comparative analysis between
existing techniques also described in Table 10. Among all the techniques the proposed
GARN has attained efficient performance than other algorithms. The hybridized meth-
ods are separately analysed and the results achieved by such techniques are also analysed
which indicates that the integrating of all methods have improved the overall efficiency
than applying the techniques in separate manner. Along with that, the ablation study
for feature selection process is also evaluated and the obtained results are provided in
Table 10.The existing classification and feature selection methods taken for comparison
are GRN (Gated recurrent network), ARN (Attention based recurrent network), RNN
(Recurrent neural network), WSO, and MO (Mutation optimization).
The computational complexity of proposed model is defined below:The complexity of
attention model is O n2 · d , for recurrent network it is O n · d 2 , and the complexity of
O k · n2 · d . This complexity shows that the proposed model has obtained efficient per-
formance by reducing the system complexity. However, using the model separately won’t
provide satisfactory performance. However, integration of such models has attained effi-
cient performance than other existing methods.
Conclusion
GARN is preferred in this research to find the various opinions of Twitter online plat-
form users. The implementation was carried out by utilizing the Sentiment 140 data-
set. The performance of the leading GARN classifier is compared with other DL models
Bi-GRU, Bi-LSTM, RNN and CNN for four performance metrics: accuracy, precision,
f-measure and recall centred with four-term weighting schemes LTF-MICF, TF-DFS,
TF-IDF, TF and W2V. The evaluation shows that the leading GARN DL technique
reached the target level for Twitter sentiment classification. Additionally, while apply-
ing the suggested term weighting scheme-based feature extraction technique LTF-MICF
with the leading GARN classifier gained an efficient result for tweet feature extrac-
tion. With the Twitter dataset, the GARN accuracy on applying LTF-MICF is 97.86%.
The accuracy value attained by the proposed classifier is the highest of all the existing
classifiers. Finally, the suggested GARN classifier is regarded as an effective DL classi-
fier for Twitter sentiment analysis and other sentiment analysis applications. The pro-
posed model has attained satisfactory result but it haven’t attained required level. This
is because the proposed architecture fails to provide equal importance to the selected
features. Due to this, few of the important features get lost, this has reduced the efficient
performance of proposed model.Therefore as a future scope, an effective DL technique
with the best feature selection method for classifying visual sentiment classification by
utilizing all the selected features will be introduced. Further, this method is analysed
using the small dataset, therefore in future large data with challenging images will be
used to analyse the performance of present architecture.
Abbreviations
DL Deep Learning
GRAN Gated recurrent attention network
LTF-MICF Log Term Frequency-based Modified Inverse Class Frequency
HMWSO Hybrid mutation based white shark optimizer
RNN Recurrent neural network
NLP Natural Language Processing
SVM Support Vector Machine
NB Naïve Bayes
Parveen et al. Journal of Big Data (2023) 10:50 Page 28 of 29
Acknowledgements
Not applicable.
Author contributions
NP and PC has found the proposed algorithms and obtained the datasets for the research and explored different
methods discussed and contributed to the modification of study objectives and framework. Their rich experience was
instrumental in improving our work. BTH and AS has done the literature survey of the paper and contributed writing the
paper. All authors contributed to the editing and proofreading. All authors read and approved the final manuscript.
Funding
Authors did not receive any funding for this study.
Declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no Competing interests.
References
1. Saberi B, Saad S. Sentiment analysis or opinion mining: a review. Int J Adv Sci Eng Inf Technol. 2017;7(5):1660–6.
2. Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J.
2014;5(4):1093–113.
3. Drus Z, Khalid H. Sentiment analysis in social media and its application: systematic literature review. Procedia Com-
put Sci. 2019;161:707–14.
4. Zeglen E, Rosendale J. Increasing online information retention: analyzing the effects. J Open Flex Distance Learn.
2018;22(1):22–33.
5. Qian Yu, Deng X, Ye Q, Ma B, Yuan H. On detecting business event from the headlines and leads of massive online
news articles. Inf Process Manage. 2019;56(6): 102086.
6. Osatuyi B. Information sharing on social media sites. Comput Hum Behav. 2013;29(6):2622–31.
7. Neubaum, German. Monitoring and expressing opinions on social networking sites–Empirical investigations based
on the spiral of silence theory. PhD diss., Dissertation, Duisburg, Essen, Universität Duisburg-Essen, 2016, 2016.
8. Karami A, Lundy M, Webb F, Dwivedi YK. Twitter and research: a systematic literature review through text mining.
IEEE Access. 2020;8:67698–717.
9. Antonakaki D, Fragopoulou P, Ioannidis S. A survey of Twitter research: data model, graph structure, sentiment
analysis and attacks. Expert Syst Appl. 2021;164: 114006.
10. Birjali M, Kasri M, Beni-Hssane A. A comprehensive survey on sentiment analysis: approaches, challenges and trends.
Knowl-Based Syst. 2021;226: 107134.
11. Yadav N, Kudale O, Rao A, Gupta S, Shitole A. Twitter sentiment analysis using supervised machine learning. Intel-
ligent data communication technologies and internet of things. Singapore: Springer; 2021. p. 631–42.
12. Jain PK, Pamula R, Srivastava G. A systematic literature review on machine learning applications for consumer senti-
ment analysis using online reviews. Comput Sci Rev. 2021;41:100413.
13. Pandian AP. Performance Evaluation and Comparison using Deep Learning Techniques in Sentiment Analysis. Jour-
nal of Soft Computing Paradigm (JSCP). 2021;3(02):123–34.
14. Gandhi UD, Kumar PM, Babu GC, Karthick G. Sentiment analysis on Twitter data by using convolutional neural
network (CNN) and long short term memory (LSTM). Wirel Pers Commun. 2021;17:1–10.
15. Kaur H, Ahsaan SU, Alankar B, Chang V. A proposed sentiment analysis deep learning algorithm for analyzing COVID-
19 tweets. Inf Syst Front. 2021;23(6):1417–29.
16. Alharbi AS, de Doncker E. Twitter sentiment analysis with a deep neural network: an enhanced approach using user
behavioral information. Cogn Syst Res. 2019;54:50–61.
17. Tam S, Said RB, Özgür Tanriöver Ö. A ConvBiLSTM deep learning model-based approach for Twitter sentiment clas-
sification. IEEE Access. 2021;9:41283–93.
Parveen et al. Journal of Big Data (2023) 10:50 Page 29 of 29
18. Chugh A, Sharma VK, Kumar S, Nayyar A, Qureshi B, Bhatia MK, Jain C. Spider monkey crow optimization algorithm
with deep learning for sentiment classification and information retrieval. IEEE Access. 2021;9:24249–62.
19. Alamoudi ES, Alghamdi NS. Sentiment classification and aspect-based sentiment analysis on yelp reviews using
deep learning and word embeddings. J Decis Syst. 2021;30(2–3):259–81.
20. Tan KL, Lee CP, Anbananthen KSM, Lim KM. RoBERTa-LSTM: a hybrid model for sentiment analysis with transformer
and recurrent neural network. IEEE Access. 2022;10:21517–25.
21. Hasib, Khan Md, Md Ahsan Habib, Nurul Akter Towhid, Md Imran Hossain Showrov. A Novel Deep Learning based
Sentiment Analysis of Twitter Data for US Airline Service. In 2021 International Conference on Information and Com-
munication Technology for Sustainable Development (ICICT4SD), pp. 450–455. IEEE. 2021.
22. Zhao H, Liu Z, Yao X, Yang Q. A machine learning-based sentiment analysis of online product reviews with a novel
term weighting and feature selection approach. Inf Process Manage. 2021;58(5): 102656.
23. Braik M, Hammouri A, Atwan J, Al-Betar MA, Awadallah MA. White Shark Optimizer: a novel bio-inspired meta-heu-
ristic algorithm for global optimization problems. Knowl-Based Syst. 2022;243: 108457.
24. Carvalho F. Guedes, GP. 2020. TF-IDFC-RF: a novel supervised term weighting scheme. arXiv preprint arXiv:2003.
07193.
25. Zeng L, Ren W, Shan L. Attention-based bidirectional gated recurrent unit neural networks for well logs prediction
and lithology identification. Neurocomputing. 2020;414:153–71.
26. Niu Z, Yu Z, Tang W, Wu Q, Reformat M. Wind power forecasting using attention-based gated recurrent unit network.
Energy. 2020;196: 117081.
27. https://www.kaggle.com/datasets/kazanova/sentiment140
28. Ahuja R, Chug A, Kohli S, Gupta S, Ahuja P. The impact of features extraction on the sentiment analysis. Procedia
Comput Sci. 2019;152:341–8.
29. Gupta B, Negi M, Vishwakarma K, Rawat G, Badhani P, Tech B. Study of Twitter sentiment analysis using machine
learning algorithms on Python. Int J Comput Appl. 2017;165(9):29–34.
30. Ikram A, Kumar M, Munjal G. Twitter Sentiment Analysis using Machine Learning. In 2022 12th International Confer-
ence on Cloud Computing, Data Science & Engineering (Confluence) pp. 629–634. IEEE. 2022.
31. Gaye B, Zhang D, Wulamu A. A Tweet sentiment classification approach using a hybrid stacked ensemble technique.
Information. 2021;12(9):374.
32. Ahmed K, Nadeem MI, Li D, Zheng Z, Ghadi YY, Assam M, Mohamed HG. Exploiting stacked autoencoders for
improved sentiment analysis. Appl Sci. 2022;12(23):12380.
33. Subba B, Kumari S. A heterogeneous stacking ensemble based sentiment analysis framework using multiple word
embeddings. Comput Intell. 2022;38(2):530–59.
34. Pu X, Yan G, Yu C, Mi X, Yu C. Sentiment analysis of online course evaluation based on a new ensemble deep learn-
ing mode: evidence from Chinese. Appl Sci. 2021;11(23):11313.
35. Chen J, Chen Y, He Y, Xu Y, Zhao S, Zhang Y. A classified feature representation three-way decision model for senti-
ment analysis. Appl Intell. 2022;1:1–13.
36. Jain DK, Boyapati P, Venkatesh J, Prakash M. An intelligent cognitive-inspired computing with big data analytics
framework for sentiment analysis and classification. Inf Process Manage. 2022;59(1): 102758.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.