Sentiment Analysis Based Twitter Tweets Classification Using Data Embedded With LSTM Technique
Sentiment Analysis Based Twitter Tweets Classification Using Data Embedded With LSTM Technique
ABSTRACT
Over the last two decades, social media sites have established themselves into our regular lifestyle.
Collecting information from social media, following trends in social media, and knowing about people's
feelings and emotions on social media are all very important today. Twitter sentiment analysis is an
application of sentiment analysis on data from Twitter (tweets), in order to extract sentiments conveyed by
the user. Twitter emotion recognition has gained a lot of attention nowadays due to its numerous uses in the
business and government sectors. It is possible to conduct many analyses using this source. One of the most
essential of these analytics, sentiment analysis, is gaining popularity. Real-time messaging and opinion
sharing in social media websites have made them valuable sources of different kinds of information.
Sentiment analysis attempts to extract beliefs, attitudes, and feelings from social media platforms like
twitter. On the other hand, Deep Learning (DL) approaches have been increasingly popular among
researchers in recent years and offer solutions to a wide range issues. However, there is a need to explore
the efficacy of real-time systems that includes popular and real-time focused tweets. Hence in this
approach, Sentiment analysis based twitter tweets classification using data embedded with LSTM
technique. The LSTM networks and convolutional Neural Networks have shown to be effective for
sentiment analysis applications. In this analysis, Long Short-Term Memory (LSTM) method is used to
evaluate a tweet's emotional content. This presented approach will achieve better results in terms of
Accuracy, Precision and True Positive Rate (TPR).
KEYWORDS: Sentiment Analysis, Deep Learning, Twitter, Long Short-Term Memory, Valence Aware
Dictionary for sEntiment Reasoner (VADER).
1264
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
using the platform’s enormous volumes of micro processing and Machine Learning. The simplest
blogging data [6]. Using succinct 140-character sentiment classification results were obtained
messages known as "tweets," users are now likely utilizing Machine Learning (ML) supervised
to spread information on various features of learning techniques such as Naive Bayes and
Twitter. Additionally, they follow other users in Support Vector Machines, but the manual labeling
order to get their twitter status updates. People use required for the supervised approach is incredibly
Twitter as a popular instant messaging service to expensive. There has been some work done on
stay informed about current events, such as unsupervised and semi-supervised techniques, and
worldwide news and advances in science. there is still much room for improvement.
Sentiment analysis attempts to gather beliefs,
attitudes, and feelings from social media platforms The reason behind this is the challenging format of
like twitter. It has become more popular as a the tweets which makes the processing difficult.
research area. The typical method of sentiment The tweet format is very small which generates a
analysis places the greatest emphasis on textual whole new dimension of problems like use of slang,
data. The most well-known micro blogging social abbreviations etc. A lot of research has been done
networking site is Twitter, where users send on Twitter data in order to classify the tweets and
updates about various topics as tweets [9]. This analyze the results. However, due to the presence of
analysis's ultimate purpose is to understanding the non-useful characters (collectively termed as noise)
emotional orientation toward entities such as along with useful data, it becomes difficult to
products, organizations, individuals, events, issues, implement models on them. On the other hand,
or topics. With the popularity of deep learning techniques in
recent years, different deep neural networks have
Due to the substantial development and been successfully used in the field. [12]. Deep
popularization of Twitter as a platform for citizens learning techniques have achieved outstanding
to share their ideas and attitudes on a wide range of results in numerous natural language processing
topics, Twitter sentiment analysis has drawn a lot of applications, including sentiment analysis, in recent
attention. Twitter sentiment analysis allows us to years. The LSTM networks and convolutional
keep track of reviews of services and product Neural Networks in particular showed to be
reviews on social media and it may help to detect effective for sentiment analysis applications. Hence
negative mentions and angry customers. Twitter in this approach, sentiment analysis based twitter
sentiment classification, which determines the tweets classification using data embedded with
polarity of a tweet’s sentiment, has gained much LSTM technique.
attention in the field of natural language processing. The remaining of the content is organized in the
Individuals, businesses, and governments all benefit following manner: The research on Twitter
from it because they want to understand people's sentiment analysis is described in section II. The
viewpoints on various products and policies. In section III demonstrates the sentiment analysis
order to extract emotional expressions from text based twitter tweets classification using data
messages (tweets) on Twitter, complex techniques embedded with LSTM technique. The section IV
for sentiment text analysis are typically describes the result analysis of presented approach.
implemented. These techniques can be divided into Section V presents the work to a conclusion.
three categories: positive, negative, and neutral.
This role also provide analyzing opinions, 2. LITEARATURE SURVEY
dialogues, announcements, and news (within a
single thread of tweets) in order to develop business Zhao Jianqiang, Gui Xiaolin and Zhang Xuejun et.
strategies, political analyses, and assessments of al. [14] for sentiment analysis, Twitter provides
public action, among other things [4]. deep convolution neural networks. In this
The automatic technique of text sentiment analysis methodology, a word embedding method that uses
can evaluate the sentiment polarity of a text latent contextual semantic linkages and co-
segment as well as whether it contains objective or occurrence statistical features between words in
subjective content. To automatically evaluate tweets is described. For training and predicting
whether a tweet has a positive or negative emotion, sentiment classifications labels, the feature set is
Twitter uses sentiment categorization. Numerous integrated into a deep convolution neural network.
techniques for extracting useful features and On five Twitter data sets, the effectiveness of this
classifying text into appropriate polarity labels are model was experimentally compared with the
based on various approaches to natural language baseline model, the term n-grams model. The
1265
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
results show that the described model is better than the conventional bag-of-words
more appropriate on accuracy and the F1- technique with a single machine learning method.
measure for classifying Twitter sentiment. However, the existing simple approaches are
Yusuf Arslan, Aysenur Birturk, Bekjan Djumabaev, statistical, based on the frequency of positive and
Dilek Kucuk et. al. [15] offers Real-Time Lexicon- negative words. Recent advances accounting for
Based Sentiment Analysis Experiments on Twitter additional content features have been made by
with A Mild (More Information, Less Data) researchers. But there are still difficulties in this
Approach. The authors utilized different sentiment area of research. To overcome these challenges and
analysis approaches, including dynamic dictionaries to develop an effective approach, sentiment
and models, and performed tests on limited but analysis based twitter tweets classification using
relevant datasets to understand the popularity of data embedded with LSTM technique is presented.
some terms and user perceptions of them. These
experiments have produced promising results. 3. SENTIMENT ANALYSIS BASED
Yafeng Ren, Meishan Zhang Yue Zhang, and TWITTER TWEETS CLASSIFICATION
Donghong Ji et. al. [17] utilizes a neural network to
provide context-sensitive Twitter sentiment In this section, sentiment analysis based twitter
classification. A context-based neural network tweets using data embedded with LSTM technique
model for Twitter sentiment analysis that integrates is presented. Fig. 1 shows the work flow diagram of
contextualized data from relevant Tweets as word the presented technique.
embedding vectors into the model The In this approach, different types of tweets are
experimental results demonstrated that our collected in real time to classify the sentiments in
proposed context-based model performs better the those tweets. A tweet is a micro blog message that
state-of-start discrete and continuous word is posted on Twitter. There are only 140 characters
representation models, proving the efficacy of the allowed. The majority of tweets are text-based with
context-based neural network model for this task. integrated URLs, images, usernames, and
Additionally, the authors discovered that topic- emoticons. There are also mistakes in papers. As a
based contextual features are the most successful in result, a series of preprocessing processes are
context-based neural network settings. performed to remove unnecessary information from
Hassan Saif, Yulan He, Miriam Fernandez, Harith tweets.
Alani et. al. [19] Contextual semantics is made Pre-processing: Raw tweets from Twitter that are
available for Twitter sentiment analysis. This collected typically produce a noisy dataset. This is a
method can detect sentiment at both the entity and result of people using social media in a casual
tweet levels. Utilizing three alternative sentiment manner. Retweets, emoticons, user mentions and
lexicons to infer pre-word sentiments, the presented other unique aspects of tweets must be
technique is tested on three Twitter datasets. Our appropriately retrieved. Therefore, normalizing the
approach outperforms the baselines in terms of raw Twitter data is necessary to provide a dataset
accuracy and F-measure for entity-level subjectivity that can be quickly trained by different classifiers.
(neutral vs. polar) and polarity (positive vs. The dataset is standardized and its size is decreased
negative) detections. On two out of three datasets using a large number of pre-processing procedures.
for tweet-level sentiment identification, this method
outperformed the most recent lexicon-based method
Senti-Strength in terms of overall performance.
Monisha Kanakaraj, Ram Mohana Reddy Guddeti
et. al. [20] explains sentiment analysis based on
NLP utilising ensemble classifiers using Twitter
data. They describe a Natural Language Processing
(NLP)-based technique to enhancing sentiment
classification by including semantics into feature
vectors and thereby utilizing ensemble methods for
classifications. Prediction accuracy is increased by
enhancing feature vectors with context-sense
identities and semantically related words.
According to analyses, the semantics based feature
vector with ensemble classification performs 3–5%
1266
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
1267
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
analysis, they must be expanded to their original Where 𝑣 is a 25-dimensional vector’s normalized
entire words form. i-value, 𝑣 is the vector's minimum value, and
𝑣 is its highest value.
Data labeling allows DL algorithms to gain a
thorough understanding of real-world environments The word vectors of a tweet are synthesized to
and conditions. A common starting point for data generate a singular vector before the sentence
categorization is to gather opinions from people vectors are created. To maintain information in a
regarding an certain set of unlabeled data. The phrase and long-distance dependency across
unlabeled and unstructured data domain is the focus sentences throughout the prediction process, a
of many data science issues. Clustering is a key supplemental strategy in word embedding is to
technique for analysing unlabeled data. partition the word vectors of a sentence in regions.
Understanding how to cluster is more crucial in this The punctuation marks on a sentence are used to
situation. Clustering can only be done on numerical divide content.
values as it primarily calculates the similarity of
each datapoint with respect to others by calculating By identifying the words that occur opposite each
their mathematical distance (Euclidean, Manhattan, phrase in the tweet, they can simultaneously
Minkowski, etc.). Hence, one can either cluster the determine its context. The sentiment polarities of
textual data points by converting them into a each sentence are estimated by looking at the
numerical dissimilarity matrix, or cluster them by orientations of the words that are commonly seen
understanding the semantics of the textual together. This method differs from others in that it
datapoints by vectorizing and clustering them by does not give words predetermined prior sentiment
their word embeddings. polarity. In order to capture the contextual
semantics of words, it takes into account their co-
If a tweet has 4 positive, 2 neutral and, 2 negative occurrence patterns in various situations. The
words, then the sentiment of the tweet will be statement "You shall know a word by the company
calculated as: Thus, the tweet would be marked as it maintains" serves as the main tenet of the concept
positive as it has more positive dominant words. of contextual semantics. Besides that, this suggests
that words that co-occur in a given context have a
The Word2Vec and GloVe word embedding specific relationship to each other, which, if
models were utilized in this study (Global Vector captured, can provide insights into their sentiment
for Word representation). 25-dimensional word orientations and significantly enhance sentiment
vectors were produced using the Word2Vec model. analysis accuracy. Following the discovery of the
The CBOW (Continuous Bag of Words) model was co-occurring terms, the sentiment lexicon VADER
utilized in the development of Word2Vec. In is used to ascertain the overall sentiment polarities
addition, words that only appeared four or more of all these co-occurring words.
times are eliminated. The maximum skip distance
between words was then adjusted to 5. The The VADER (Valence Aware Dictionary and
Word2Vec model is used to extract the concept of sEntiment Reasoner) Lexicon is used to determine
relatedness across words or products for the overall sentiment polarities of all co-occurring
applications such as semantic relatedness, synonym words after they have been found. Additionally,
detection, concept classification, selectional VADER does a fantastic job of managing slang,
preferences, and analogies. GloVe is a distributed acronyms, and emojis. It not only expresses the
word representation approach that is derived from polarities of the sentiment, such as positive or
Global Vectors. The algorithm used by the model to negative, but also its strength, or how powerful the
learn word vector representations is unsupervised sentiment actually is. It outputs four scores:
learning. This is accomplished by translating words positive (pos), negative (neg), neutral (neu), and
into a significant space where the distance between compound, or overall score. Scores show how
words is connected with their similarity measures. much of the text falls into the positive, negative,
With its pre-trained word vectors, GloVe is used. and neutral categories. An overall value resulting
The following equation is used to standardize each from all normalized language ratings is known as a
and every vector: composite score.
𝑣 −𝑣 SentiCircle’s main premise is that a word's
𝑣 = (1)
𝑣 −𝑣 orientation is dynamic and constantly changes
depending on its context, rather than being static or
1268
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
In this example, tanh stands for the tangent True Positive Rate: It counts the number of
activation functions, X stands for the input data, W records that are accurately categorised as positive
and b stand for the weight and bias factors, out of the total positive records. It is also known as
respectively, Ct stands for the cell state, c ~ t stands the sensitivity or recall and is expressed as
for the candidate gate, and ht stands for the output 𝑇𝑃
of the LSTM cell. The SentiScore of each tweet is 𝑇𝑃𝑅 = × 100 (9)
𝑇𝑃 + 𝐹𝑁
aggregated to get its final emotion direction. The
final outcome of the methodology is a five-class Precision: Precision measures how many of the
fine-grained segmentation of sentiment analysis instances are predicted correctly as positive and is
into highly encouraging, highly negative, neutral, expressed as
and highly positive categories. 𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = × 100 (10)
𝑇𝑃 + 𝐹𝑃
1269
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
Percentage of Tweets
Table 1: Performance Metrics Evaluation
40
60 20
40 0
TPR (%) DL Appraoch LSTM
20 Technique
0 Precision Fig. 4: Accuracy Comparative Graph
DL LSTM (%)
Approach Technique The LSTM technique has achieved high accuracy
for twitter tweets classification whereas the DL
approach has less accuracy. Hence the sentiment
analysis based twitter tweets classification using
Fig. 2: Performance Comparison
data embedded with LSTM technique has
Compared to DL (Deep Learning) approach, effectively classified the different tweet polarities
presented LSTM technique has high TPR (True like positive, negative, neutral, highly positive and
Positive Rate) and precision. The Fig. 3 shows the highly negative from the tweets collected in real
sentiment polarities comparison of different time. As a result this approach provides several
sentiment polarities such as negative, positive, benefits like find out the brand perception; Builds
neutral, highly positive and highly negative. The x- stronger customer relationships; Offer better
1270
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
customer service; Identifies key emotional triggers; Letters 158 (2022) 164–170, Elsevier, doi:
Discover new marketing strategies; Boost profits; 10.1016/j.patrec.2022.04.027
Manage crisis better. [4] Pavlo Radiuk, Olga Pavlova, and Nadiia
Hrypynska, “An Ensemble Machine Learning
5 . CONCLUSION Approach for Twitter Sentiment Analysis”,
In this work, sentiment analysis based twitter 6th International Conference on
tweets classification using data embedded with Computational Linguistics and Intelligent
LSTM technique is presented. Firstly, the real time Systems, May 12–13, 2022,
tweets data is collected from twitter. Data labelling [5] Abdalsamad Keramatfar, Hossein Amirkhani,
is done to lable the tweets as positive and negative. Amir Jalaly Bidgoly, “Multi-thread
Data pre-processing is performed to remove the hierarchical deep model for context-aware
unnecessary data and to clean the data since Tweets sentiment analysis”, Journal of Information
data contains different characters, hashtags, special Science, 1–12, The Author(s) 2021, DOI:
characters. Data embedding techniques like 10.1177/0165551521990617
Word2Vec and Glove techniques are used to [6] Harsh Sakhrani, Saloni Parekh, Pratik Ratadiya,
represent the twitter data into vector representation. “Contextualized Embedding based
The LSTM technique is used to identify the Approaches for Social Media-specific
sentiments. The VADER sentiment polarities are Sentiment Analysis”, 2021 International
used to determine different polarities in tweets. Conference on Data Mining Workshops
Finally using the sentiscore, the tweets are (ICDMW), 2375-9259/21, 2021 IEEE, DOI
classified as positive, negative, neutral, highly 10.1109/ICDMW53433.2021.00030
negative and highly positive. The performance of [7] Guizhe Song and Degen Huang, “A Sentiment-
presented technique is evaluated in terms of Aware Contextual Model for Real-Time
accuracy, precision and TPR. To evaluate the Disaster Prediction Using Twitter Data”,
effectiveness of presented approach it is compared Future Internet 2021, 13, 163,
with Deep learning approach. Compared to DL doi:10.3390/fi13070163
approach, the LSTM technique has better TPR, [8] Hawar Sameen Ali Barzenji, “Sentiment
Precision and Accuracy. This approach is an Analysis of Twitter Texts Using Machine
effective one for classifying the various real time Learning Algorithms”, Academic Platform
twitter texts. Journal of Engineering and Science 9-3, 460-
471, 2021
REFERENCES: [9] Nikhil Yadav, Omkar Kudale, Aditi
Rao, Srishti Gupta & Ajitkumar Shitole,
[1] Mohammad Eid Alzahrani, Theyazn H. H. “Twitter Sentiment Analysis Using
Aldhyani, Saleh Nagi Alsubari, Maha M. Supervised Machine Learning, Conference
Althobaiti, and Adil Fahad, “Developing an paper, Springer, doi: 10.1007/978-981-15-
Intelligent System with Deep Learning 9509-7_51
Algorithms for Sentiment Analysis of E- [10] Akshi Kumar, Kathiravan Srinivasan, Cheng
Commerce Product Reviews”, Hindawi Wen-Huang, Albert Y. Zomaya, “Hybrid
Computational Intelligence and Neuroscience, context enriched deep learning model for fine-
Volume 2022, Article ID 3840071, 10 pages, grained sentiment analysis in textual and
doi:10.1155/2022/3840071 visual semiotic modality social data”,
[2] Mohammed Kasri, Marouane Birjali, Mohamed Information Processing and Management 57
Nabi, Abderrahim Beni-Hssane, Anas El- (2020) 102141,
Ansari and Mohamed El Fissaoui, “Refining doi:10.1016/j.ipm.2019.102141
Word Embeddings with Sentiment [11] Wafaa S. Albaldawi, Rafah M. Almuttairi,
Information for Sentiment Analysis”, Journal “Near Real Time Twitter Sentiment Analysis
of ICT Standardization, 2022, Vol. 10_3, and Visualization”, 2nd International
353–382,doi: 10.13052/jicts2245-800X.1031 Scientific Conference of Al-Ayen University
[3] D. Sunitha, Raj Kumar Patra, N.V. Babu , A. (ISCAU-2020), IOP Conf. Series: Materials
Suresh, Suresh Chand Gupta, “Twitter Science and Engineering 928 (2020) 032044,
sentiment analysis using ensemble based deep doi:10.1088/1757-899X/928/3/032044
learning model towards COVID-19 in India [12] Ahmad Fathan Hidayatullah, Siwi
and European countries”, Pattern Recognition Cahyaningtyas, Anisa Miladya Hakim,
“Sentiment Analysis on Twitter using Neural
1271
Journal of Theoretical and Applied Information Technology
28th February 2023. Vol.101. No 4
© 2023 Little Lion Scientific
1272