An Assessment of Sentiment Analysis of Covid 19 Tweets
An Assessment of Sentiment Analysis of Covid 19 Tweets
Volume 7 Issue 5, September-October 2023 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
I. INTRODUCTION
There was global anarchy as a result of the COVID- discuss the topic of vaccination via tweets, retweets,
19 epidemic, which ravaged every country on Earth. etc. Many insightful conclusions may be derived by
All hope hinged on the vaccine because of the analysing people's moods according on the content of
mutative and aggressive character of the virus. Pfizer, their tweets on Twitter utilizing sentiment analysis
Moardina, Covi Shield, and many other international technologies. Opinion mining for sentiment analysis
corporations worked hard to develop an effective is a data-analysis method that may establish whether
vaccine. In any case, the notion that adverse effects the data is good, negative, or neutral.
for vaccinations are unavoidable was not effectively Therefore, the purpose of this research is to provide
absorbed by the general population, despite there substantial insights by analysing the mood of all
being a clear majority of approval. Many people's tweets on vaccines. The goal of this study is to use
opinions are influenced by what they read or hear in Sentiment Analysis to do an exploratory data analysis
the mainstream and social media. Consequently, of all tweets and Twitter data. The results of this
social media played a crucial role in communication study will provide light on how the general public
and expression of thoughts about vaccines, with feels about COVID-19 vaccinations.
Twitter in particular playing a pivotal role due to its
unique features that allow users to tweet (i.e., express The study is organized as follows, with Section 2
an opinion), retweet (i.e., support an opinion), and focusing on prior studies that are pertinent to the topic
extend comments and like to a wider audience. at hand. Sentiment analysis is defined and briefly
discussed in Section 3. The dataset that was utilized
With over 500 million tweets sent every day, Twitter for this analysis is described in great depth in Section
is a treasure trove of information that may be mined 4. In Section 5, we detail all the findings from our
for insights if used correctly. Many academic exploratory data analysis of the dataset. Experiment
investigations have used Twitter data. Twitter was findings, key insights, and future plans for this model
used as a platform for individuals in India to openly are presented in Sections 6 and 7, respectively.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 534
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
II. LITERATURE WORK sentiment analysis. In order to analyze public opinion
By using the capabilities of Natural language on the issue of Coronavirus, the authors of paper [6]
processing (NLP) to analyze the sentiment that is combine data from two sources: the textual tweets
being expressed in the specific data, the notion of posted in April 2020 from six nations and the tweets
opinion mining or sentiment analysis has been of top 10 politicians. In the end, the report presents
employed and modified for diverse studies throughout findings that shed light on the similarities and
time. Previous studies that have shed light on this variances in public opinion among nations. The
topic are discussed below. results showed that across all six nations, respondents
In the paper[1], the BERT model is used to do felt the most "trust," "fear," and "anticipation."
Sentiment Analysis on Twitter data. Tweets were Sentiment analysis utilizing word weighting TF-IDF
geotagged in order to classify the data utilized in this and Logistic Regression was performed on the
work. The BERT model for emotion categorization Twitter data from 30th April 2020 in article [7]. This
algorithm successfully classified the sentiment of the
was used to train the data, and the SVM classifier was
used to assess the model's effectiveness. On the tweets with an accuracy of 94.71%.
whole, the acquired data was accurate to within 4%. Our understanding of the prior studies in this area was
Paper [2] presents an analytical framework for impact much enhanced by this literature study. Our project's
of COVID-19 on the stock market based on tweets trajectory is now clearer thanks to this.
during the outbreak. Supervised learning was used to
III. SENTIMENT ANALYSIS
train this model, which achieved an accuracy of 86.24
An application of Natural Language Processing
percent. The studies were conducted after the (NLP), sentiment analysis classifies data and texts to
Coronavirus epidemic to aid businesses in forecasting
reveal how people feel about a topic [8]. This helps in
stock prices, identifying new marketing opportunities, understanding the author's intentions and point of
and monitoring their own growth. In paper [3], we view. This technique uses a scoring system that
analyze what people were tweeting about most during
shows the true meaning and viewpoint of the text. We
and after the first outbreak of the COVID-19 can more quickly identify positive, bad, and neutral
pandemic. For topic extraction, we used Latent
aspects of the material by using these evaluations.
Dirichlet Allocation (LDA), and for sentiment
Businesses regularly use opinion mining (or
analysis, we relied on a Lexicon-based strategy. This
"Emotion AI") and sentiment analysis (or "sentiment
report does a good job of summing up the concerns of
analysis") to get insight into how customers and the
different groups during the early stages of the wider public feel about a brand or product.
epidemic. Using a dataset of 600,000 English-
language tweets, the model was trained using 80% of To gauge public opinion about COVID vaccinations,
the data and then tested using 20% of the data. The we use Sentiment Analysis to data gathered from
article used sentiment analysis to illustrate people's Twitter after the second wave of Coronavirus. The
thoughts on the most discussed issues. study's findings may provide light on the public's
thoughts and feelings towards COVID-19 vaccines.
This paper [4] examines tweets from across all of
India's states during the months of November 2019 There are two main phases to any sentiment analysis:
and May 2022. In this article, we successfully used 1. Prioritizing, sanitizing, and selecting features
sentiment analysis to the gathered information and from datasets
found that, on the whole, Indians had an optimistic 2. Applying Sentiment Analysis to the Data
outlook on life. IV. DATASET DESCRIPTION
There was a correlation between the number of The project began off with information collection and
confirmed cases of COVID19 in a given state and the classification. For this study, we analyzed data from
number of tweets sent from that state. The research the 'Covid-19 All Vaccine Tweets' collection. The
[5] provides a comprehensive analysis of the tone of data, which covers the period from December 2020 to
all tweets related to COVID-19. In this case, we August 2021 and consists of 80,418 tweets, was
evaluated the tone of the tweets using Logistic acquired from kaggle.com [9]. Table 1 lists the
Regression, VADER sentiment analysis, and BERT characteristics and provides explanations for each.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 535
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Table 1: Attributes of the dataset and their description
ATTRIBUTES DESCRIPTION
id This gives the id of the tweet
user_name User name of the person who has tweeted
user_location The location of the person who has sent the tweet
user_description The Twitter bio of the person writing the tweet
user_created When the Twitter account of the user was created
user_followers Number of followers of the person sending the tweet
user_friends Number of friends of the person sending the tweet
user_verified Binary value specifying whether the user is verified on Twitter or not
date Date and time when the tweet was sent
text The text in the tweet as it is
hashtags Specifies all the hashtags that were used in the tweet
source Gives information about the source(device or application) from which the tweet was sent
retweets Number of times the tweet was retweeted
favourites Number of people who have marked the tweet as a ‘favourite’
is_retweet Tells us if the tweet is a retweet or a new one
Following this, the tweets in the dataset were cleaned up by removing things like mentions, hashtags, retweet
information, and links. Time stamps for tweets were also eliminated since they were deemed unnecessary. Some
of the most salient characteristics from the aforementioned list are chosen for exploratory research.
V. METHODOLOGY
This section explains in depth how Sentiment Analysis was carried out on the selected dataset.
Gathering information that may be used in the analysis was the first stage. The same was discussed at length in
the preceding paragraph. The dataset contains clean, pre-processed data. Eighty three hundred and six records
survived after duplicate columns were removed from the dataset. The tweets were then cleaned up by removing
any traces of mentions, hashtags, retweets, links, etc. We also scrubbed the data for tweet timestamps. Then, a
subset of the aforementioned traits was chosen since it was more relevant to the data analysis being conducted. A
few key graphs were displayed after a graphical study of the data was performed.
The number of tweets sent from each device type is displayed in Fig. 1. The majority of tweets were sent from
Android devices, followed by the Twitter Web App, and then the Cowin Vaccination Availability platform, as
seen in the provided scatter plot.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 536
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Figure 2: Plot showing the number of tweets sent from verified or unverified accounts
The most popular tweets about COVID-19 vaccinations were identified by taking the top 10 most retweeted
tweets from the dataset. They look like Fig. 3 down below.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 537
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
The top 20 accounts based on the frequency of the tweets were found out. They are as shown in Fig. 4 below.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 538
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Figure 6 shows the CDF of tweet sentiments and the distribution of tweet sentiments throughout the sample.
Distribution of Sentiments Across Our Tweets
Figure 7: Word Clouds for the common words among the most positive and most negative tweets
To go further, we plotted word clouds from tweets about a select number of nations and places. Figures 8 and 9
and Figure 10 depict them below.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 539
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Figure 11: Word cloud for the tweets Covishield and Covaxin
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 540
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
The data was then processed using some sophisticated methods to generate the color-coded word clouds for the
divided tweets. The information was once again scrubbed. The tweets were filtered to remove any grammatical
or spelling errors or nonsense. After that, the whole Twitter data was classified into positive, negative, and
neutral categories, and corresponding cloud words were formed for each. That's what Fig. 12 shows.
Figure 13: Colour-coded word clouds for the Covishield and Covaxin
A. Polarity
A word's polarity is the degree to which it expresses a negative, neutral, or positive emotion. Words with a
positive polarity have a value of 1, whereas neutral words have no polarity and negative words have a value of -
1. The polarity of a tweet is calculated by taking the mean of all the words in it, which is a float value between -1
and +1. Polarity of a tweet is a matrix that breaks down a tweet's emotional tone into positive, negative, and
neutral categories.
B. Subjectivity
The ratio of subjective to objective details in a tweet or paragraph depends on the
speaker. The degree to which a writing is subjective rises as more private details are included and falls as more
objective data is presented. It's a proxy for the author's degree of involvement in the tweet or other source
content.
A small subset of the dataset's generalizations about vaccination was then chosen for testing of polarity and
subjectivity in terms of sentiment. In Figures 4 and 5, we see the polarity and subjectivity ratings.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 541
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Figure 16: A flowchart of the methodology followed for implementation of this project
VI. RESULTS AND DISCUSSIONS
In this study, 16,05,152 tweets and retweets were honest, sometimes controversial, opinions. With the
analysed for attitude about immunization in India. help of Natural Language Processing (NLP) methods
After being cleaned, pre-processed, and having like subjectivity and polarity, the study analyses
duplicates removed, the data utilized in the research millions of public views expressed via tweets on the
was ready for analysis. The tweets had their Twitter network and provides us with the necessary
timestamps, mentions, hashtags, retweets, and links analysis as outputs in the form of graphs and tables.
deleted. Word clouds of varying permutations were This study's findings highlight the need for increased
extracted while also plotting crucial graphs like vaccination awareness and provide new insight into
emotion labels graphs and distribution graphs, among the factors that make some individuals feel uneasy
others, to get the gist of the data. Additionally, a few about being vaccinated.
general comments about vaccination were chosen and Since it is crucial to grasp the public's mood in a
examined for polarity and subjectivity to determine
variety of scenarios, this research has significant
how they were received. Data analysis shows that the future potential. Sentiment analysis may be conducted
majority of Indians have a favourable opinion about again to examine public opinion on the third wave,
vaccination, but that there is still a strong negative public opinion on immunization delay in India, and
attitude towards the practice in the country. similar subjects. There is a wealth of relevant data at
VII. CONCLUSION AND FUTURE SCOPE our disposal, which we may decipher and analyse for
The public has been affected in a variety of ways by use as a springboard for future action. Any piece of
the coronavirus, and this sort of study may aid private information may be utilized as a dataset to
government and other research organizations in help analyse the tone of a tweet or other piece of data.
comprehending public sentiment and filling in In today's lightning-fast, data-powered world, having
knowledge gaps. Twitter data analysis is crucial since a matrix to better comprehend the reasoning behind
it is a place where many individuals express their opinions on how to use data is critical.
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 542
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
REFERENCES pp. 1-7,
[1] Singh, M., Jakhar, A.K. & Pandey, S. doi:10.1109/ICRAIE51050.2020.9358301.
Sentiment analysis on the impact of
[5] A. J. Nair, V. G and A. Vinayak, "Comparative
coronavirus in social life using the BERT
study of Twitter Sentiment on COVID - 19
model. Soc. Netw. Anal. Min. 11, 33 (2021). Tweets," 2021 5th International Conference on
https://doi.org/10.1007/s13278-021-00737-z
Computing Methodologies and Communication
[2] International Journal of Computational (ICCMC), 2021, pp. 1773-1778,
Intelligence Research ISSN 0973-1873 Volume doi:10.1109/ICCMC51019.2021.9418320.
16, Number 2 (2020), pp. 87-104 © Research [6] G. Matošević and V. Bevanda, "Sentiment
India Publications
analysis of tweets about COVID-19 disease
https://dx.doi.org/10.37622/IJCIR/16.2.2020.87
during pandemic," 2020 43rd International
-104
Convention on Information, Communication
[3] Manal Abdulaziz, Alanoud Alotaibi, Mashail and Electronic Technology (MIPRO), 2020, pp.
Alsolamy and Abeer Alabbas, “Topic based 1290-1295,
Sentiment Analysis for COVID-19 Tweets” doi:10.23919/MIPRO48935.2020.9245176.
International Journal of Advanced Computer [7] Imamah and F. H. Rachman, "Twitter
Science and Applications (IJACSA), 12(1), Sentiment Analysis of Covid-19 Using Term
2021.
Weighting TF-IDF And Logistic Regression,"
http://dx.doi.org/10.14569/IJACSA.2021.01201
2020 6th Information Technology International
72
Seminar (ITIS), 2020, pp. 238-242,
[4] T. Vijay, A. Chawla, B. Dhanka and P. doi:10.1109/ITIS50118.2020.9320958.
Karmakar, "Sentiment Analysis on COVID-19
[8] “Sentiment Analysis”
Twitter Data," 2020 5th IEEE International
https://brand24.com/blog/sentiment-analysis/
Conference on Recent Advances and
Innovations in Engineering (ICRAIE), 2020, [9] “Dataset”https://www.kaggle.com/gpreda/allco
vid19-vaccines-tweets
@ IJTSRD | Unique Paper ID – IJTSRD59976 | Volume – 7 | Issue – 5 | Sep-Oct 2023 Page 543