Hybrid Annotation and Classification For Predicting Attitudes Towards COVID 19 Vaccines For Arabic Tweets
Hybrid Annotation and Classification For Predicting Attitudes Towards COVID 19 Vaccines For Arabic Tweets
https://doi.org/10.1007/s13278-024-01294-x
ORIGINAL ARTICLE
Abstract
In March 2020, the whole world suffered from the coronavirus pandemic. This virus is a sort of virus that comes in many
forms, some of which may kill. It mainly affects the human respiratory system. The development and search for COVID-19
vaccines became the global goal to stop the spread of the deadly disease. By the end of 2020, the first set of immunizations
started to become available. Some countries began their immunization campaigns early. Meanwhile, others awaited the
outcome of a successful trial. This research explores classifying users’ hesitation or confidence about COVID-19 immuni-
zations. To determine the sentiment of tweets related to vaccines, we collected tweets in Arabic related to various vaccines.
After collecting the tweets, we have done pre-possessing using natural language processing (NLP) techniques. After that,
we developed a hybrid approach for data annotation to detect the polarity of data. We used a hybrid data annotation utilizing
three different lexicons. Finally, many machine learning (ML) and deep learning (DL) methods such as Multinomial Naïve
Bayes (MNB), logistic regression (LR), support vector machine (SVM), long short-term memory (LSTM), combined Gated
Recurrent Unit (GRU), conventional neural network and combinations of CNN and LSTM and their hybrid versions were
used and compared. Experimental results revealed that the proposed hybrid annotation method outperformed the conventional
one in predicting the confidence or hesitation of people regarding COVID-19 vaccines. The maximum accuracy achieved
was 98.1% using the hybrid CNN-GRU with a hybrid approach to data annotation.
Keywords Sentiment analysis · Natural language processing (NLP) · Deep learning (DL) · Recurrent neural networks
(RNN) · Long short-term memory (LSTM) · Gated recurrent units (GRUs)
Vol.:(0123456789)
137 Page 2 of 15 Social Network Analysis and Mining (2024) 14:137
rates, Western countries were leading. First, in the world, was introduced later, there haven’t been many studies inves-
the United Kingdom (UK) approved the BioNTech Pfizer tigating this topic in Arabic. To inform decision-making
vaccine (Nbc news 2021b). Fewer than two-thirds of Arabs with COVID-19 or comparable pandemic vaccination tac-
complained about COVID-19 vaccinations (Kaadan et al. tics, this work is helpful for health strategies and Arabic
2021). Comparatively, this rate is lower than the global aver- government officials.
age. To achieve efficient immunization in Arabic countries,
developing strategies to promote vaccine acceptance to
attain optimal coverage is essential. 1.1 Motivation and contributions
Undoubtedly, social media is one of the most widely
adopted mediums for expressing thoughts, feelings, and This research is framed by the following main research
opinions (Web 2022a). Twitter is one of the world’s most questions:
popular social media sites, where users can post whatever
they want. Twitter has over 100 million active users, Nda- 1. How can we create a hybrid annotation framework for
sauka et al. (2016), with up to 500 million tweets posted per labeling the polarity of Arabic tweets automatically?
day (Cureg et al. 2019). 2. How can the sentiment polarity of user-generated posts
Social media is now serving as a forum for gathering about the COVID-19 vaccine be improved using hybrid
public opinions. Through it, people may easily share their annotation?
news and personalize their own experiences. Many people 3. Which classification algorithms and annotation tech-
can respond to comments and express their feelings. Social niques perform better when applying ML and deep
media may also be used to help people and governments get learning techniques for polarity classification in tweets,
through disasters and crises. such as LSTM, GRU, and hybrid versions of them?
The main motivation for this work is that such trends may
pick up quickly; analyzing trends on social platforms helps Our contributions and the novelty of the paper can be sum-
avoid problems, understand people’s opinions, and manage marized as follows:
situations.
Sentiment analysis or opinion mining is an effective • Building a private COVID-19 Arabic tweets dataset that
tool for automatically detecting the emotions expressed on was used in the experiments.
social platforms. Applying sentiment analysis can be useful • Developing a new Arabic lexicon specifically tailored for
for decision making. It has several applications in various COVID-19 to enhance data annotation and automate the
fields, including businesses and organizations, which require labeling of polarity in Arabic tweets.
customer opinions regarding the products they market and • Developing a hybrid annotation approach based on voting
services they produce. This paper aims to analyze the senti- results of three lexicon dictionaries.
ments expressed through tweets since the beginning of the • Developing hybrid models CNN-LSTM and CNN-GRU
COVID-19 epidemic. The study starts with the collection based on CNN, LSTM, and GRU to improve the perfor-
of tweets from social media. The data was cleaned up and mance of the classification models.
then ML and deep learning models were used to extract the • A performance comparison of ML and deep learning
polarity of the text (Chakraborty et al. 2020). Deep learning methods and their hybrid versions with different annota-
and machine learning algorithms, particularly LSTM and tion methods was performed.
GRU and their variants, have demonstrated promising results • Authorities can use this data to help them make informed
in extracting emotional polarity from text. Deep learning decisions about vaccination campaigns. Additionally, to
provides a way to handle complicated data processing and monitor any developments in real-time.
computing. The most widely used methods in supervised
learning-based deep learning research (Ibrahim 2019) for This study has some limitations these limitations may impact
natural language processing. Various RNN LSTM, GRU, generalizability. Here are some potential limitations:
and conventional machine learning methods were imple- The study findings may not represent the broader Arabic-
mented in this research. A new Arabic lexicon for Covid-19 speaking population due to potential biases in selecting the
was created to enhance the classification performance by tweets or participants. The study’s findings may be specific
combining existing lexicons. The suggested method was to the period and context in which the data was collected.
used to categorize Arabic tweets according to their polarity. Attitudes towards COVID-19 vaccines can change, influ-
In earlier investigations, COVID-19 data from Twit- enced by various factors such as emerging information, news
ter was successfully used (Aljameel et al. 2021; Ali 2021; events, or changes in public health policies. Therefore, the
Alhuri et al. 2020; Alshammari and Alanazi 2020; Hayawi study’s conclusions may not apply to different time frames
et al. 2022). However, because COVID-19 immunization or regions.
Social Network Analysis and Mining (2024) 14:137 Page 3 of 15 137
The article follows this structure: Firstly, in Sect. 2, we in the COVID-19 Arabic sentiment analysis. In the previ-
review the literature concerning the analysis of emotions ously mentioned study, they invistigated the performance
from the text. Subsequently, in Sect. 3, we delve into the of RNNs, specifically long short-term memory (LSTM) and
methodology, encompassing data collection and the pro- gated recurrent unit (GRU), in classifying Arabic sentiment
posed system architecture. Section 4 showcases the experi- analysis about the COVID-19 crisis. Using Global Vector
mental findings and initiates a discussion. Finally, in Sect. 5, (GloVe) as a word embedding technique, the results showed
we present our conclusions and offer suggestions for future that GRU outperforms LSTM, achieving 81% precision. In
research endeavors. Alshammari and Alanazi (2020), authors developed a Multi-
Annotation Scheme for named entity recognition tasks for
diseases. More than sixty thousand words in the dataset were
2 Related work manually annotated using the inside-outside (IO) annota-
tion scheme by two separate annotators. The inter-annotator
Numerous investigations and research projects have been agreements rate was computed to guarantee the dependa-
carried out regarding sentiment analysis such as: Aljameel bility of the annotation process, and it received a score of
et al. (2021); Ali (2021); Alhuri et al. (2020); Alshammari 95.14%. The annotation schemes are IE, BI, BIES, IOB,
and Alanazi (2020); Hayawi et al. (2022). Social media data IOB, and IOE. Every record in the dataset additionally pro-
is a highly useful method for replacing classical surveys vides for five linguistic features: lexical markers, stopwords,
or interviews of people to determine their feelings (Khan gazetteers, and part-of-speech tags.
et al. 2022). There are many methods for sentiment analy- Researchers in Hayawi et al. (2022) gathered and inter-
sis, including lexicon-based, machine-learning, and hybrid preted tweets about COVID-19 vaccines. They used machine
techniques (Madhoushi et al. 2015). learning (ML) algorithms to identify false information about
Research studies have been conducted based on a variety vaccines. Using credible sources and validation from medi-
of factors, such as the investigation of Twitter data in an cal professionals, more than 15,000 tweets were categorized
attempt to determine how the Ebola virus spreads (Liang as containing accurate or incorrect information about vac-
et al. 2019) and the COVID-19 outbreak (Prabhakar Kaila cines. The LSTM, BERT transformer models, and XGBoost
and Prasad 2020), and tracking public opinions on Twitter classification models were investigated. In Noor et al. (2022)
during pandemics (Szomszor et al. 2011). they analyzed Arabic tweets regarding COVID-19 using ML
In Aljameel et al. (2021), the authors developed an Ara- and DL to determine what people think about vaccines in
bic COVID-19 tweets dataset. They applied different ML various areas. They gathered tweets, sorting 12,000 Arabic
models: SVM, KNN, NB with N-gram, and TF-IDF to pre- tweets into three categories: positive, neutral, and negative
dict a person’s knowledge of the protective measures. The sentiments. Finally, geographic information systems (GIS),
results showed that SVM recorded the highest performance. with a focus on Saudi Arabia, are utilized to map the spatial
According to the awareness forecast, the center region had distribution of people’s perceptions and emotions. Accord-
the lowest level of awareness of COVID-19 containment ing to experimental results, SVC performs better and is more
measures, while the south region had the highest level of accurate than other approaches.
awareness. In Malki et al. (2022) they used Seasonal Autoregressive
In Ali (2021), authors collected tweets related to online Integrated Moving Average (SARIMA) to predict the total
learning to carry out a thorough sentiment analysis (SA) and number of vaccines in the next few days. The total number
emotion mining during the COVID-19 outbreak, Second, of vaccinations in the upcoming days was predicted using
the eight emotions were counted according to their emo- the SARIMA model. The SARIMA model parameters are
tional weight by analyzing the National Research Council examined using a grid search approach to obtain the most
Canada (NRC) Word-Emotion lexicon. Third, a filtering accurate forecast. Sentiment analysis research in Cotfas et al.
method called information gain (IG) was applied. Fourth, (2021) investigated the shifts in public opinion on Twitter in
the underlying causes of the unfavorable attitudes were iden- the first month after the start of the UK vaccination proce-
tified and examined. Finally, several classification methods dure, focusing on the COVID-19 vaccines. For this, a dataset
were implemented, including SVM, KNN, MNB, LR, NB. of 5,030,866 tweets in the English language was collected
Results showed the SVM gave 89.6% and it was the best between December 8, 2020, and January 7, 2021, from Twit-
result among all models. ter. A stance analysis was performed after contrasting several
A recurrent neural network (RNN) was used to analyze conventional machine learning and deep learning models.
Arabic tweets using sentiment analysis in Alhuri et al. (2020) Using hashtags, n-grams, and latent Dirichlet allocation,
for tweets collected from April 9th to 11th, 2020, at the the main debate themes were found, and the tweets about
beginning of the pandemic. RNNs have been utilized for sen- COVID-19 vaccine reluctance were examined to the impor-
timent analysis. However, they were not heavily employed tant occasions that occurred throughout the study period.
137 Page 4 of 15 Social Network Analysis and Mining (2024) 14:137
The results of this study can help those involved to respond and F1-score are likely employed to gauge the models’ per-
to worries regarding COVID-19 vaccine resistance more formance in classifying sentiment for Arabic COVID-19
effectively. related tweets. Figure 1 visually depicts the entire workflow
Most research studying vaccines was done for materials of the proposed methodology. This visualization provides a
written in the English language. Only a few studies focus on clear understanding of the sequential steps involved in ana-
Twitter data related to COVID-19 attitudes towards vaccina- lyzing the sentiment of Arabic COVID-19 tweets.
tion written in Arabic (Vorovchenko et al. 2017; Villavicen-
cio et al. 2021; Chaudhri et al. 2021; Sattar and Arifuzzaman 3.1 Data collection
2021). Some studies are focusing on covid19 in Arabic such
as Aljameel et al. (2021); Ali (2021); Alhuri et al. (2020); Twitter was used to gather data since it is where individuals
Aljabri et al. (2021). express their opinions and ideas. This makes it a perfect
To the best of the authors’ knowledge, few studies focus platform for discovering people’s thoughts and feelings
on COVID-19 vaccination opinions in Arabic. Arabic regarding various subjects. With perspectives on a wide
NLP and, particularly, sentiment analysis still need many range of topics, including social issues, politics, business,
improvements. Moreover, the Arabic language imposes and economics, Twitter is recognized as a trove of passionate
many challenges due to its complex structure, various dia- writing. Utilizing the Twitter API Nbc news (2022b) to col-
lects, and the lack of data sources and benchmarks. lect around 29,000 Arabic tweets about COVID-19 vaccines.
Then, we filtered the tweets by keywords associated with
different COVID-19 vaccinations and keywords.
3 Methodology "موديرنا,""سينوفارم.The Tweets were collected from 16 July
2020 to 26 July 2021. We also only collected tweets in Ara-
This section delves into the five phases of the methodology bic. Twitter data collection consists of four steps. Obtaining
employed to develop the sentiment analysis system for Ara- access to the Twitter developer API is the first step. Then
bic COVID-19 tweets, as illustrated in Fig. 1. The first phase writing Python code using the Python library Tweepy Nbc
focuses on gathering relevant data. We use Twitter’s API news (2021c) using access token to enable us to access Twit-
to collect Arabic tweets containing keywords and hashtags ter data. The second step is entering our previously men-
related to COVID-19. Once collected, the data undergoes tioned search keywords related to COVID-19 vaccines, fil-
a meticulous cleaning process. This process removes irrel- tering tweets with only the Arabic language, and saving this
evant information such as retweets, duplicates, URLs, and collected data as a dataset.
mentions (e.g., usernames). Additionally, text normalization
techniques are applied to address inconsistencies in Arabic 3.2 Data preprocessing
script. Following cleaning, the preprocessed tweets move on
to Phase 2, where they are assigned sentiment labels (posi- • We cleaned the tweets and removed noise and irrele-
tive, negative, or neutral). This crucial step utilizes a data vant symbols, duplicated sentences, URLs, punctuation
annotation model (DAM). The DAM likely employs natural marks, numbers, or combinations. We also omitted all
language processing (NLP) techniques such as sentiment symbolic and non-Arabic alphabets and special charac-
lexicons or rule-based systems to analyze the content and ters.
assign appropriate sentiment labels. Phase 3 is considered • Stopwords removal: This is accomplished by removing
the heart of the system, which focuses on sentiment classi- the stop words we use this list of Arabic Stop words ,قبل
fication. Here, we leverage the power of various techniques, ,صباحا, قليل,عليه, فيه, لكن, ضمن, على, عندما, هنا,منذ,حول
including NLP, machine learning, and potentially deep learn- هذه و فيها ف ولم ل, ما ال, بسبب,فوق, مع, هناك,أما, بأن,يكون,بان,لهم
ing, to build robust sentiment classification models. These , لها, هي,أ كانت,هم, بها, حاليا, بعض,آخر ثانية انه من االن جدا به بن
models learn from the labeled data in Phase 2, enabling them , إنه, خارج, الى,وجود,’ عند, تحت, وقد, فان, إلى, ينبغي, أنفسهم, تم,نحن
to automatically categorize the sentiment of unseen tweets. ,كان ثم,فعل, أنفسكم,يمكن جميع, كل, فهى, فهي,إذا, أنها, هؤالء,مرة,اى
Once the sentiment classification models are built, Phase 4 , أنه, التي,التى,ليس, الذين, تكون, له, لن, لم, ديك, في,فى, وقال, اآلن,لي
assesses their effectiveness and efficiency. This evaluation و, فه, وإن,ضد, لك, إنها, كال, يجري, لماذا, وأن, دون, ان, حتى, بعد,وان
involves applying the models to unlabeled data (tweets not , نحو,نفسي, كما, بعيدا, عليها, وما,خالل, بين,وال, لديه, أى,انها منها
used for training). Metrics like accuracy, precision, recall, ,إما, ومن, معظم,إال,فما أيضا, هى, وكانت, لقاء,ولن, انت, نفسه,نفسك,هو
وهى, وهو, وهي,بينما,’” اال. It was noted that this could • Semval (2016) the third is added from Nbc news (2021e).
reduce the index size. We removed Some words such as Semval2016 is A dataset of tweets manually annotated
, بان, لهم, صباحا, قليل, عليه, فيه,ضمن, على, عندما, هنا,منذ, حول,قبل for stance towards a given target, a target of opinion
آخر,ف ولم ل, فيها, و, هذه, ما, بسبب, فوق, مع, هناك, أما, بأن,يكون (opinion towards), and sentiment (polarity), polarity
and others repeatedly throughout the dataset (other words refers to the sentiment orientation of a text, which can
that frequently replated in the dataset without giving any be positive, negative, or neutral. It focuses on determin-
meaning ), although they are unimportant to the interpre- ing the sentiment expressed in a piece of text. we used
tation. We eliminated all derogatory terms from this list AraSenTi and Semval Because they have a vast collection
to protect sentiment analysis. of Arabic tweets and achieved good performance.
• The text is first tokenized, which separates it into words
(the smallest unit). Stemming: In this step, the words The process works as follows: Tokenize each tweet in this
were standardized by being truncated to their stem words. file into a set of words w1, w2, w3...,. Then, we calculate
Porter Stemmer from the NLTK library was employed the polarity for each sentence for each dictionary and sum
(Web 2021d). up their polarity scores to represent the sentiment of the
whole tweet.
3.3 Data annotation If this score is greater than 1, then the tweet is defined
as positive.
The process of building the model for tweet annotation If the value is 0, the tweet is neutral; otherwise, it is
works as follows. The input of this method was an Excel file defined as negative, as illustrated in Fig. 2.
containing preprocessed, unlabeled tweets, and the output Finally, we conducted a hard vote on all of the results to
was a collection of categorized and annotated tweets with determine the annotation decision.
their sentiment (positive, negative, or neutral). This section This scenario demonstrates sentiment analysis on a tweet
explains the process of developing a hybrid data annota- using three Arabic lexicons: Covid-19 (2019), AraSenTi
tion method. Data is classified as positive, negative, or neu- (2016), and Semval (2016). Our task is to categorize the
tral during this phase. Due to the challenges with Arabic sentiment of the tweet as positive, negative, or neutral.
text, we utilized a mix of semantic techniques in this study. After analyzing the tweet, each lexicon assigns a senti-
Additionally, there is currently a lack of publicly available ment label:
tweet datasets for testing target-dependent Twitter Arabic Covid-19: Positive AraSenTi: Negative Semval2016:
text sentiment classification. There are two ways to create a Negative Since two out of the three lexicons (AraSenTi and
sentiment lexicon, manually or automatically. However, we Semval2016) classify the sentiment as negative, the final
have used a combination of both approaches in this work. sentiment assigned to the tweet is negative.
The creation of a sentiment lexicon autonomously is a chal- This approach highlights the importance of using multiple
lenging task, and its success can have a significant impact sentiment analysis tools, as they may provide different clas-
on sentiment analysis systems based on lexicons. This dif- sifications due to their varying strengths and weaknesses.
ficulty arises particularly when dealing with large input or Combining the results from multiple sources can lead to a
evaluation data. The primary issue in establishing a lexicon more robust and accurate overall sentiment analysis results.
can be similar to the challenge of creating a language parser.
Additionally, sentiment analysis is highly variable. In this
research, we propose a hybrid approach to data annotation,
which is illustrated in Fig. 4. This figure presents the method
used to estimate the sentiment (label) of a tweet. We used
three Arabic lexicons:
ment analysis due to their ability to process sequential work. Moreover, it prevents overfitting in the network.
data, capture long-term dependencies, handle variable- Numerous studies have widely used The ReLU activa-
length inputs, address the vanishing gradient problem, tion function to improve the network’s training (Wang
and adaptively learn relevant features. These properties et al. 2022). It has proved to resist the gradient vanishing
make them well-suited for modeling the complex nature problem.
of natural language and extracting sentiment-related • A pooling layer is typically added after the convolutional
information from text. layer to help mitigate the limitations of the invariance Li
– GRU“GRU” stands for “Gated Recurrent Unit,” which is et al. (2022) of the produced feature maps, whereas the
a type of architectures used in recurrent neural networks activation function is used to control the overfitting of the
(RNNs). RNNs are a type of artificial neural network network.
designed to process sequential data, such as text or time • A dropout layer is a clever approach to solve the overfit-
series. In 2014, the GRU was presented as an alterna- ting problem. It is normally used in the creation of deep
tive to LSTM, to reduce complexity. Unlike LSTM, GRU learning models (Yin et al. 2020). In this layer, neurons
does not have output layers, which results in fewer train- are randomly chosen, and some of them are deactivated
able parameters. during training. To prevent outfitting in the LSTM layer.
• As demonstrated in Sect. 3.4.4, LSTM is a kind of recur-
3.6.1 Hybrid models combining CNN‑LSTM rent neural network that is utilized for prediction based
on learning long-term dependencies. In this study, we
Figure 3 illustrates the architecture of a combined CNN- constructed the hybrid model for polarity classification
LSTM deep learning framework. The model consists of using LSTM.
eight layers, including an embedding layer, one convolu- • Two GRU layers are used to model temporal features.
tional layer, one pooling layer, two LSTM layers, two dense Finally, a dense layer is used to predict the output. A
layers, a dropout layer, and a fully connected layer. dense layer also called a “fully connected layer,” is
Each word in the pre-processed dataset has a unique ID used to perform classification of the extracted features
layer, and the word sequence is both original and meaning- (Rehman et al. 2019) of the convolutional layers.
ful. Here’s a brief description of each layer: • Using a dense layer, every current input (or neuron) in
the network layer is connected to every input (or neu-
• The embedding layer represents each word with a vec- ron) in the layer that follows it. On the other hand, the
tor corresponding to a particular character. It establishes GRU module aims to capture the long-term dependency
random weights for the words and learns how to include in the data (Wu et al. 2020 ) and can learn useful infor-
every term in the training dataset in the embedding. mation from historical data for a long period through the
Although this layer serves multiple purposes, its primary memory cell. Whereas the forget gate ignores useless
objective is to assist users in embedding words so they information.
can be used in future models. • The output of the sequence learning block is coupled to
• Convolutional layers receive the words from the embed- a final classification layer to produce the output’s final
ding layer in the form of sentences. It is used to apply form.
filtering to the input word matrix. The filtering process
is useful for providing a map of features (Wu et al. 2021) 3.6.2 Hybrid models combining CNN‑GRU
that indicates the pattern of the input data. The ReLU
activation function is used to identify the features within The GRU-CNN hybrid neural network is suggested as a way
the tweets. The convolution layer involves the input using to combine the benefits of the CNN module, which excels
pooling layers, which reduce the representation of input at processing high-dimensional data, and the advantages of
sentences, input parameters, and computation in the net- the GRU module, which can process time sequence data
4.2 Experimental settings giving it the ability to boost the model’s performance. For
CNN-LSTM, we use the parameters activation function sig-
We conducted several experiments to evaluate the proposed moid, the loss is binary cross-entropy, the optimizer is adam
Arabic text annotation and categorization methods. We per- and the batch size is 200.
formed classification experiments on the prepared dataset
using various data annotation models that included both
4.2.1 Performance metrics
positive and negative data. Moreover, 67% of the dataset was
used for training, while 33% was used for testing the imple-
Classification performance is typically measured using pre-
mented combinations of classification models. Concerning
cision, recall, f-measure, and accuracy metrics. TP, FP, TN,
classification models, convolutional layers are employed to
and FN are calculated as true positives, false positives, and
minimize the dimensionality of the input data and capture
false negatives, respectively. While (TN) showed a nega-
sequence information (Qing et al. 2019) with initialized fil-
tive result, it returned a positive result, while (TP) showed
ters. The convolutional procedure is carried out in the con-
a negative result, but it returned a positive result. (TP) indi-
volutional layer. Twenty filters with 2-pixel windows move
cates that the result is positive, whereas (TN) indicates the
on the text representation of the convolution layer to get the
result is negative.
features. As the filter runs, many sequences of syntactical
and semantic properties are generated. Maximized Pool- TP + TN
Accuracy = . (4)
ing Layer In natural language processing, variable-length TP + FP + TN + FN
tensors can be transformed into fixed-length tensors using
the pooling layer. Subsampling The pooling layer collects TP
Precision = (5)
more important data. There are two types of pooling lay- TP + FP
ers: average pooling layers and maximum pooling layers.
K-Max Pooling Layer After the convolutional layer, k-max is TP
applied in the network to maintain the order of features. This
Recall = (6)
TP + FN
ensures that the input to the next layer is not dependent on
the length of the input sentence. Maximum Pooling Result 2 ⋅ precision ⋅ recall
Matrix By changing the value of k in each feature sequence, F1 − score = (7)
precision + recall
more important maps are extracted. Multilayer GRU Weng
et al. (2019). However, the feature sequences obtained paral- The ROC is a graphical curve that evaluates binary clas-
lelly from the k-max pooling layer do not fully use sequence sification performance at various thresholds. It shows the
information. GRU specializes in sequential modeling and tradeoff between TPR and specificity.
can extract contextual information from feature sequences,
137 Page 10 of 15 Social Network Analysis and Mining (2024) 14:137
We conducted several experiments to show the impact Table 2 Result of classifiers using AraSenTi
of using the proposed hybrid annotation and classification Model Labeling Accuracy Recall F1-score Precision
methods for classifying user opinions regarding COVID-19
vaccines on social media for Arabic tweets using sentiment CNN-LSTM AraSenTi 92.20 91.45 90.91 90.44
analysis. CNN_GRU AraSenTi 92 91.03 90.64 90.2873
LSTM AraSenTi 83.17 73.18 76.18 87.74
4.3 Results of applying classification models
with Semval2016
CNN-GRU is the best in terms of accuracy in the case of
Table 1 shows the results obtained for different classification using the Covid-19 lexicon only for data annotations.
algorithms including LSTM, CNN-LSTM, and CNN-GRU In all instances employing various lexicons, the hybrid
using the Semval 2016 lexicon only for data annotations. LSTM classifier outperformed the other classifiers in terms
The table indicates that CNN-LSTM has the best perfor- of accuracy. Accuracy (the percentage of correctly identi-
mance accuracy at 92.50% among all the other classifiers. fied test records), precision (positive predictive value), recall
Whereas, LSTM and CNN-GRU gave almost similar accu- (negative predictive value), and f-score (the harmonic mean
racy using the Semval 2016 lexicon only for data annota- of precision and recall) were used as performance indicators.
tions. These results prove that the hybrid model CNN-LSTM
is the best in terms of accuracy in the case of using the 4.6 Results of applying classification models
Semval2016 lexicon only for data annotations. with hybrid data annotations
Fig. 10 ROC curve of a hybrid data annotation and SVM model Fig. 12 ROC curve of a hybrid data annotation and CNN-LSTM
model
4.7 Discussion
5 Conclusions
Table 5 Comparison with References Dataset Models Data annotation model Model performance
existing related work
Aljameel et al. (2021) Private dataset SVM NA Accuracy 85%
Ali (2021) Private dataset SVM Adjective lexicon Accuracy 89.6%
Alhuri et al. (2020) Private dataset GRU NA F1 score81
Mubarak et al. (2022) ArCovidVac AraBERT Different layered annotation Accuracy 86.4
Hayawi et al. (2022) Private dataset XGBoost Manual annotation Accuracy 95.6%
Our work Private dataset CNN-GRU Hybrid data annotation model Accuracy 98.20%
Our work Private dataset CNN-LSTM Hybrid data annotation model Accuracy 94.71%
Our work Private dataset LSTM Hybrid data annotation model Accuracy 92.62%
Our work Private dataset SVM Hybrid data annotation model Accuracy 92.47%
Our work Private dataset NBM Hybrid data annotation model Accuracy 81.2%
Our work Private dataset LR Hybrid data annotation model Accuracy 91.20%
annotation, demonstrates good performance when analyz- Ethical approval and consent to participate Not applicable.
ing how people feel about the COVID-19 vaccination. A
Consent for publication Not applicable.
maximum accuracy of 98.1 % was obtained when employ-
ing a hybrid combination of CNN and GRU combined
with a hybrid approach to data annotation. Regarding The
effect of hybrid annotation and classification for predicting
attitudes towards COVID-19 vaccines for Arabic tweets References
on society can have several implications. Here are some
potential effects: Aldayel HK, Azmi AM (2016) Arabic tweets sentiment analysis-a
hybrid scheme. J Inf Sci 42(6):782–797
Alhuri LA, Aljohani HR, Almutairi RM, Haron F (2020) Sentiment
• Understanding Public Sentiment By applying hybrid analysis of COVID-19 on Saudi trending hashtags using recurrent
annotation and classification techniques to analyze Ara- neural network. In: 2020 13th International conference on devel-
bic tweets about COVID-19 vaccines, researchers and opments in eSystems engineering (DeSE). IEEE, pp 299–304
Ali MM (2021) Arabic sentiment analysis about online learning to
policymakers can gain a deeper understanding of public mitigate COVID-19. J Intell Syst 30(1):524–540
sentiment and attitudes towards vaccination in the Arab- Aljabri M, Chrouf SMB, Alzahrani NA, Alghamdi L, Alfehaid R,
speaking population. Alqarawi R, Alhuthayfi J, Alduhailan N (2021) Sentiment analy-
• Identifying Vaccine Hesitancy Hybrid annotation and sis of Arabic tweets regarding distance learning in Saudi Arabia
during the COVID-19 pandemic. Sensors 21(16):5431
classification methods can help identify patterns of vac- Aljameel SS, Alabbad DA, Alzahrani NA, Alqarni SM, Alamoudi FA,
cine hesitancy or resistance within the Arabic-speaking Babili LM, Aljaafary SK, Alshamrani FM (2021) A sentiment
community. This knowledge can assist health authorities analysis approach to predict an individual’s awareness of the pre-
in tailoring their vaccination campaigns to address spe- cautionary procedures to prevent COVID-19 outbreaks in Saudi
Arabia. Int J Environ Res Public Health 18(1):218
cific concerns and increase vaccine acceptance. Alshammari N, Alanazi S (2020) An Arabic dataset for disease named
entity recognition with multi-annotation schemes. Data 5(3):60
In future work, we plan to evaluate other social media plat- Amelio A, Bonifazi G, Corradini E, Ursino D, Virgili L (2022) A mul-
forms. Moreover, we will use big data platforms and explain- tilayer network-based approach to represent, explore and handle
convolutional neural networks. Cognit Comput 15(1):1–29
able machine learning methods. Batra R, Imran AS, Kastrati Z, Ghafoor A, Daudpota SM, Shaikh S
(2021) Evaluating polarity trend amidst the coronavirus crisis in
Acknowledgements We thank Minia University and Stdf for the finan- peoples’ attitudes toward the vaccination drive. Sustainability
cial support for publishing this work. 13(10):5344
Chakraborty K, Bhatia S, Bhattacharyya S, Platos J, Bag R, Hassan-
Author contributions Authors contributed equally. ien AE (2020) Sentiment analysis of COVID-19 tweets by Deep
Learning Classifiers—a study to show how popularity is affecting
Funding Not applicable. accuracy in social media. Appl Soft Comput 97:106754
Chaudhri A A, Saranya S, Dubey S (2021) Implementation paper on
Availability of data and materials Data is available upon reasonable analyzing COVID-19 vaccines on twitter dataset using tweepy and
request from the corresponding author. text blob. Annal Rom Soc Cell Biol 8393–8396
Cotfas L-A, Delcea C, Gherai R (2021) COVID-19 vaccine hesitancy
Declarations in the month following the start of the vaccination process. Int J
Environ Res Public Health 18(19):10438
Conflict of interest No competing of interests.
Social Network Analysis and Mining (2024) 14:137 Page 15 of 15 137
Cureg MQ, De La Cruz JAD, Solomon JCA, Saharkhiz AT, Balan Rehman AU, Malik AK, Raza B, Ali W (2019) A hybrid CNN-LSTM
AKD, Samonte MJC (2019) Sentiment analysis on tweets with model for improving accuracy of movie reviews sentiment analy-
punctuations, emoticons, and negations. In: Proceedings of the sis. Multimed Tools Appl 78:26597–26613
2019 2nd international conference on information science and Sattar NS, Arifuzzaman S (2021) COVID-19 vaccination awareness
systems, pp 266–270 and aftermath: public sentiment analysis on twitter data and vac-
Hayawi K, Shahriar S, Serhani MA, Taleb I, Mathew SS (2022) Anti- cinated population prediction in the USA. Appl Sci 11(13):6128
vax: a novel twitter dataset for COVID-19 vaccine misinformation Szomszor M, Kostkova P, St Louis C (2011) Twitter informatics: track-
detection. Public Health 203:23–30 ing and understanding public reaction during the 2009 swine flu
Ibrahim N (2019) Text mining using deep learning article review. Int pandemic. In: 2011 IEEE/WIC/ACM international conferences
J Sci Eng Res 9(1916):11 on web intelligence and intelligent agent technology, vol 1. IEEE,
Kaadan MI, Abdulkarim J, Chaar M, Zayegh O, Keblawi MA (2021) pp 320–323
Determinants of COVID-19 vaccine acceptance in the Arab Villavicencio C, Macrohon JJ, Inbaraj XA, Jeng J-H, Hsieh J-G (2021)
world: a cross-sectional study. Glob Health Res Policy 6(1):1–7 Twitter sentiment analysis towards COVID-19 vaccines in the
Khan A K A, MAJUMDAR D, MONDAL B, MUKHERJEE S (2022) Philippines using Naïve Bayes. Information 12(5):204
A deep learning approach to sarcasm detection from composite Vorovchenko T, Ariana P, Loggerenberg Fv, Amirian P (2017) # Ebola
textual data. INFOCOMP J Comput Sci 21(2) and Twitter. What insights can global health draw from social
Li S, Wu C, Xiong N (2022) Hybrid architecture based on CNN and media? In: Big data in healthcare, Springer, p 85–98. https://doi.
transformer for strip steel surface defect classification. Electron- org/10.1007/978-3-319-62990-2_5
ics 11(8):1200 Wang X, Ren H, Wang A (2022) Smish: A novel activation function
Liang H, Fung IC-H, Tse ZTH, Yin J, Chan C-H, Pechta LE, Smith for deep learning methods. Electronics 11(4):540
BJ, Marquez-Lameda RD, Meltzer MI, Lubell KM et al (2019) Web. (2021a) World health organization. Mental health and psychoso-
How did Ebola information spread on twitter: broadcasting or cial considerations during the COVID-19 outbreak. WHO/2019-
viral spreading? BMC Public Health 19(1):1–11 nCoV/MentalHealth/2020. Accessed 17 Aug 2021
Madhoushi Z, Hamdan AR, Zainudin S (2015) Sentiment analysis tech- Web. Nbc news (2021b) www.nbcnews.com. Accessed 28 Aug 2021
niques in recent works. In: 2015 Science and information confer- Web. Nbc news (2021c) https://www.tweepy.org. Accessed 30 July
ence (SAI). IEEE, pp 288–291 2021
Malki A, Atlam E-S, Hassanien AE, Ewis A, Dagnew G, Gad I (2022) Web. Web (2021d) https://www.nltk.org/howto/stem.html . Accessed
Sarima model-based forecasting required number of COVID-19 31 July 2021
vaccines globally and empirical analysis of peoples’ view towards Web. Nbc news (2021e) https://saifmohammad.com/WebPages/Arabi
the vaccines. Alex Eng J 61(12):12091–12110. https://doi.org/10. cSA.html . Accessed 31 July 2021
1016/j.aej.2022.05.051 Web. Web (2022a) https://github.com/stuti-sharma/Sentiment-Analy
Milzam R (2022) Sentiment analysis in Valorant game review using sis-Twitter-RapidMiner . Accessed 28 Aug 2021
information gain. Jurnal Teknik Informatika CIT Medicom 14(2) Web. Nbc news (2022b) https://developer.twitter.com/en/ docs/twitt
Mubarak H, Hassan S, Chowdhury SA, Alam F (2022) ArCovidVac: er-api. Accessed 22 July 2021
analyzing arabic tweets about COVID-19 vaccination. arXiv pre- Weng L, Li Q, Xuehai D (2019) GRU based convolutional neural net-
print arXiv:2201.06496 work with initialized filters for text classification. Aust J Intell Inf
Muneer A, Fati SM, Akbar NA, Agustriawan D, Wahyudi ST (2022) Process Syst 15(2):75–85
iVaccine-Deep: prediction of COVID-19 mRNA vaccine degra- Wu JM-T, Li Z, Herencsar N, Vo B, Lin JC-W (2021) A graph-based
dation using deep learning. J King Saud Univ Comput Inf Sci CNN-LSTM stock price prediction algorithm with leading indica-
34(9):7419–7432 tors. Multimed Syst 29(3):1–20
Ndasauka Y, Hou J, Wang Y, Yang L, Yang Z, Ye Z, Hao Y, Fallgatter Wu L, Kong C, Hao X, Chen W (2020) A short-term load forecasting
AJ, Kong Y, Zhang X (2016) Excessive use of twitter among col- method based on GRU-CNN hybrid neural network model. Math
lege students in the UK: validation of the microblog excessive use Probl Eng 2020(1):1–10
scale and relationship to social interaction and loneliness. Comput Yin X, Niu Z, He Z, Li ZS, Lee D-H (2020) Ensemble deep learn-
Human Behav 55:963–971 ing based semi-supervised soft sensor modeling method and its
Noor TH, Almars A, Gad I, Atlam E-S, Elmezain M (2022) Spatial application on quality prediction for coal preparation process. Adv
impressions monitoring during COVID-19 pandemic using Eng Inform 46:101136
machine learning techniques. Computers 11(4):52
Pal S, Ghosh S, Nag A (2018) Sentiment analysis in the light of LSTM Publisher's Note Springer Nature remains neutral with regard to
recurrent neural networks. Int J Synth Emot (IJSE) 9(1):33–39 jurisdictional claims in published maps and institutional affiliations.
Prabhakar Kaila D and Prasad DA (2020) Informational flow on twit-
ter–corona virus outbreak–topic modelling approach. Int J Adv Springer Nature or its licensor (e.g. a society or other partner) holds
Res Eng Technol (IJARET) 11(3) exclusive rights to this article under a publishing agreement with the
Qing L, Linhong W, Xuehai D (2019) A novel neural network-based author(s) or other rightsholder(s); author self-archiving of the accepted
method for medical text classification. Future Internet 11(12):255 manuscript version of this article is solely governed by the terms of
Ramadhan W, Novianty SA, Setianingsih SC (2017) Sentiment analy- such publishing agreement and applicable law.
sis using multinomial logistic regression. In: 2017 International
conference on control, electronics, renewable energy and com-
munications (ICCREC). IEEE, pp 46–49