0% found this document useful (0 votes)

4 views11 pages

Twitter and Emotions: Exploring Sentiment Detection

The study investigates the feasibility of sentiment detection from unstructured text on Twitter using various machine learning methods, including Deep Learning, Decision Trees, Naive Bayes, and Support Vector Machines. It demonstrates the effectiveness of these methods in classifying tweets as positive, negative, or neutral, supported by a manually labeled dataset of tweets. The research highlights the importance of preprocessing techniques and the use of meta-classifiers to enhance sentiment analysis outcomes across different application fields.

Uploaded by

Rafael Guzman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

Twitter and Emotions: Exploring Sentiment Detection

Uploaded by

Rafael Guzman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Twitter and Emotions: Exploring Sentiment

Detection
José Carmen Morales Castro, Tirtha Prasad
Mukhopadhyay, Rafael Guzmán Cabrera John R. Baker
(corresponding author) University of Economics and Finance, Vietnam;
Universidad de Guanajuato, México Shinawatra University, Thailand
jc.moralescastro@ugto.mx, guzmanc@ugto.mx, drjohnrbaker@yahoo.com
tirtha@ugto.mx

Abstract—Human emotions are often discerned through tone,

facial expressions, and gestures via face-to-face interactions. One example of these techniques is using base
However, the question arises: can sentiment be accurately classifiers and linguistic resources, which provide a
identified from unstructured text on social networks? In this foundation for identifying sentiments and
study, we demonstrate that it is indeed possible. We applied categorizing tweets as positive, negative, or neutral.
four machine learning methods—Deep Learning, Decision
Trees, Naive Bayes, and Support Vector Machines—in two
This can significantly facilitate initial data
classification scenarios: cross-validation and training/test sets, processing [4]. Such work also involves a meta-
enhanced by a meta-classifier. Our goal was to identify which classifier that integrates multiple models and
combination of classification scenario, learning method, and approaches to generate more robust and reliable
preprocessing performs best in sentiment analysis. To validate predictions about a tweet's sentiment. Additionally,
our approach, we used a manually labeled corpus, forming
three datasets of different sizes with varying preprocessing
including a Deep Learning technique allows us to
techniques. The results underscore the viability and explore complex and non-linear data patterns.
effectiveness of the proposed approach and provide
implications for various fields (product development, II. PROBLEM STATEMENT
marketing, political analysis, customer service education,
linguistic education). A key challenge is automatically identifying
sentiment in unstructured texts, particularly tweets,
Keywords—natural language preprocessing, sentiment analysis, machine using an architecture that combines base classifiers
learning and linguistic resources. To address this, we
developed automated tools to extract subjective
I. INTRODUCTION information (opinions or feelings) from natural
language texts. This process allows for the
Within the context of the exponential growth of generation of structured, processible knowledge for
social networks, Twitter stands out as a virtual decision-making systems, enabling a better
space where millions of users share their opinions, understanding of users’ perceptions and facilitating
emotions, and experiences in real time. Such a the adoption of strategic measures based on
platform offers a unique window into understanding accurate, relevant information.
how people relate to their surroundings, which is
why sentiment analysis has become essential for III. METHODS
understanding the complexity of human expressions
in the digital world [6]. However, sentiment To address the sentiment in unstructured texts
analysis on social networks like Twitter presents (tweets), we began with an exhaustive review of
various challenges due to the nature of the related work, expanding upon an established design
messages. To overcome this ambiguity, employing [6], to identify the different types of classifiers,
a multifaceted approach that combines different methodologies, and evaluation metrics. This
techniques and methodologies is vital. culminated in a research design that allowed us to
address the task competitively and efficiently. (Fig relevant keywords, including hashtags and stems.
1). This process was included in our experimental
design to ensure consistency and relevance in the
data utilized for sentiment analysis.

Afterward, CNNs were used to compare the results.

In this context, we chose to implement CNNs in the
Weka platform using the WekaDeeplearning4j
extension, which is based on the library of the same
name and allows us to follow a specific procedure
that begins with the installation of the extension as
Fig.1. Research Design. the first step in the platform. This extension offers a
graphical interface for configuring, training, and
In the next stage, we selected a suitable database evaluating deep learning models and can extract
and performed data preprocessing, applying spatial features from data and offers APU for
techniques drawn from related research. We utilized integration with Java applications. A key feature is
a dataset of approximately 163,000 manually its ability to leverage GPUs and distributed clusters,
labeled tweets, categorized by polarity as positive, significantly speeding up model training and
negative, or neutral. These tweets were sourced inference, especially with large datasets [5].
from an archived dataset [see 6]. Our study builds
on this previous work by expanding the scope In the next step, data preprocessing (Figure 2) was
through the introduction of new deep learning performed once the corpus was obtained and the
models, such as convolutional neural networks experiment was undertaken. For this stage, a series
(CNNs), and by analyzing additional dimensions of of steps were taken to standardize the structure of
sentiment classification. Furthermore, we enhanced all the tweets in the corpus, facilitating their
the methodological framework by incorporating interpretation during processing.
additional machine learning models and refining the
preprocessing techniques. This extension enables us
to explore more advanced methodologies and
provides deeper insights into how various
approaches affect the accuracy and scalability of
sentiment analysis tasks, particularly when working
with large datasets. In this case, to evaluate
perceptions of public figures on social networks
during a national event (2019 general elections in
India) and to automatically classify these opinions,
facilitating adjustments to public figures' messaging
or moderating discourse based on public sentiment.

The texts were labeled with opinion values: (0) for

neutral, (1) for positive, and (-1) for negative. After
identifying the dataset, we divided it into two
subsets for the analysis, consisting of 1,000 and
5,000 tweets, respectively. The choice of these Fig.2. Steps followed within preprocessing.
specific sizes facilitated comparison between a
Five steps were taken to carry out the
smaller and larger dataset, allowing us to observe
preprocessing. First, stop words or empty words
the impact of dataset size on model performance.
were removed, i.e., those that have no meaning by
All messages were filtered for both subsets using
themselves [4]. Then, uppercase letters were
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
converted to lowercase to homogenize the corpus. performance by avoiding overfitting and comparing
Afterward, the tweets were tokenized, segmenting predictions with actual classifications [8].
the text into phrases or words. Next, the tweet was
lemmatized to reduce morphological variability and In both cases, supervised learning methods were
improve the accuracy of text processing. Finally, used to classify the comments according to their
the information gain technique was applied to corresponding label. These techniques included (1)
measure the relevance of an attribute within a Support Vector Machines (SVM), a learning-based
dataset. method that provides support for solving problems
through classification and regression based on
Considering the preprocessing phase, both datasets training and resolution phases, the result of which is
were divided into four files, each with different proposed output to an established [3]; (2) Naive
preprocessing stages. We termed the first set Bayes (NB), a classifier that calculates the
baseline; this file did not experience additional probability of an event given information based on
preprocessing, keeping the tweets in their original the additional assumptions theorem [7]; and (3)
form without stop word removal, lemmatization, or Decision Trees (J48 algorithm), a machine learning
application of information gain techniques. algorithm that builds decision trees for
Preprocessing was carried out for the next set. This classification to select the feature with the highest
consisted of eliminating stop words, lemmatizing, discrimination capacity at each node to be able to
and applying information gain, resulting in 352 divide the data set into subsets [9]. These
selected attributes for the set of 1000 data and 637 techniques proved effective in achieving precise
attributes for the set of 5000 data. For the third set, class separation and obtaining high performance in
only the information gain technique was applied to comment classification.
select the most relevant attributes, resulting in 422
attributes for the 1000 data set and 2399 attributes Precision was used as the evaluation metric, a
for the 5000 data set. Finally, a similar procedure performance measure applied to data retrieved from
was undertaken for the fourth set and the second a set, corpus, or sample space. It is also termed a
file. This resulted in the elimination of stop words positive predictive value, representing the
and lemmatization, generating 3319 attributes for proportion of relevant retrieved instances, as
the set of 1000 data and 597 attributes for the set of indicated in Eq. 1.
5000 data, clarifying that the information gain
technique was not applied in this case.
(1)
Our research used two classification scenarios: ten-
fold cross-validation, a method for evaluating Where “tp” corresponds to a true positive value and
predictive models and preventing overfitting. This “fp” to a false positive value [2].
model was trained with a subset of the data and
validated on the remaining sets. This process was Finally, as an additional step, a meta-classifier was
repeated ten times to ensure an accurate evaluation implemented that combined the three best learning
and classification [1]. The second classification techniques based on the best percentage of accuracy
scenario used was the training and testing sets, obtained in the experiments: SVM, Naive Bayes,
where the dataset is divided into two, one for and Decision Trees.
training and one for testing. Most of the data were
used in the training of the model, while a smaller The results were presented in tables and
portion was allocated for testing and evaluating its comparative graphs, showing the best outcomes for
performance. The training set adjusted the model each set using both classification scenarios from the
using selected data to improve its accuracy on new Weka platform. These highlighted the highest
data, and the testing set evaluated the model's precision values—a key performance metric
(positive predictive value) representing the
proportion of relevant instances among those
retrieved, indicating the percentage of correctly
classified instances.

IV. RESULTS

This study aimed to investigate sentiment analysis

on Twitter using various machine learning
techniques, particularly Support Vector Machines
(SVM), Naive Bayes, and Decision Trees. We
conducted experiments on two datasets of 1,000 and
5,000 tweets, employing different preprocessing
techniques. Fig. 2: Comparison of 1000 data Training and Testing Sets.

Our key findings indicate that SVM consistently The following figures show the results obtained for
outperformed other classifiers across various the dataset containing 5000 tweets.
preprocessing methods and dataset sizes. The
preprocessing approach utilizing information gain
yielded the best results for both datasets.
Additionally, cross-validation and training/testing
scenarios revealed similar trends in performance,
while a meta-classifier combining SVM, Naive
Bayes, and Decision Trees improved overall
accuracy.

The detailed results are presented in the following

figures.

Fig. 3: Comparison of 5000 data Cross-Validation.

Fig. 1: Comparison of 1000 data Cross-Validation.

Fig 4: Comparison of 5000 data Training and Testing Sets.

Our results demonstrate that sentiment analysis

plays a crucial role in extracting information from
unstructured texts on Twitter, generating structured
knowledge useful for decision-making.
The Support Vector Machine (SVM) algorithm However, this study has limitations that future
consistently obtained the best performance across research should address. The dataset's focus on
all experiments, particularly in preprocessing, specific political figures suggests a need for broader
including information gain. For the 1000-tweet topic exploration to test the generalizability of the
dataset, this preprocessing resulted in 422 attributes, methods. Further studies could also examine the
while for the 5000-tweet dataset, it yielded 637 effectiveness of these techniques in different
attributes. languages, analyze temporal aspects, compare
traditional and deep learning approaches more
V. DISCUSSION AND CONCLUSION comprehensively, and develop systems for real-time
sentiment analysis.
This study demonstrates that classifiers such as
SVM, Naive Bayes, and Decision Trees can achieve REFERENCES
high accuracy in sentiment classification. The
strong performance of these classifiers, especially [1] Berrar, D. (2019). Cross-validation [White Paper] Department of
Information and Communications Engineering, School of Engineering,
when combined with a meta-classifier, contributes Tokyo Institute of Technology, Tokyo, Japan
to the development of automated tools for [2] Bowers, A. J., & Zhou, X. J.. (2019). Receiver operating characteristic
(ROC) area under the curve (AUC): A diagnostic measure for evaluating
extracting information from unstructured text. the accuracy of predictors of education outcomes. Journal of Education
These tools can aid in decision-making processes by for Students Placed at Risk. 24(1), 20-46.
providing relevant and precise data derived from [3] Castro, J. C. M., Carrillo, L. M. L., & Cabrera, R. G. (2022)
Identificación de polaridad en Twitter usando validación cruzada.
social media sentiments. Identidat Engergetica, 4, 86-90
[4] Jianqiang, Z., & Xiaolin, G. (2017). Comparison research on text pre-
The sentiment analysis techniques explored in this processing methods on Twitter sentiment analysis. IEEE access, 5,
study have significant implications across various 2870-2879.
[5] Lang, S., Bravo-Marquez, F., Beckham, C., Hall, M., & Frank, E.
fields. In product development, they allow (2019). Wekadeeplearning4j: A deep learning package for Weka based
companies to better understand user reactions to on deeplearning4j. Knowledge-Based Systems, 178, 48-50.
data-driven improvements. In marketing, these tools [6] Morales-Castro, J. C., Pérez-Crespo, J. A., Prasad-Mukhopadhyay, T., &
Guzmán-Cabrera, R. J. J. B. E. R. D. E. B. (2022a). Automatic
help gauge public sentiment toward campaigns, identification of sentiment in unstructured text. 6(15), 22-28.
enabling real-time strategy adjustments. In political [7] Morales Castro, W., & Guzmán Cabrera, R. J. C. y. S. (2020).
Tuberculosis: Diagnóstico mediante procesamiento de imágenes. 24(2),
analysis, they offer insights into public opinion, 875-882.
benefiting campaign managers and policymakers. [8] Santana Mansilla, P. F., Costaguta, R. N., & Missio, D. (2014).
Aplicación de Algoritmos de Clasificación de Minería de Textos para el
Additionally, organizations can enhance customer Reconocimiento de Habilidades de E-tutores Colaborativos.
service by automating the categorization and Inteligencia Artificial 18(54), 2-11
prioritization of feedback. Finally, they have [9] Witten, I. H., Frank, E., Hall, M. A., Pal, C. J., & Data, M. (2005).
Practical machine learning tools and techniques. Data mining. Elsevier
important implications for linguistic education,
particularly in preparing students who intend to
enter these areas.

Jennifer N. Baggerly (Editor) - Contemporary Case Studies in Clinical Mental Health For Children and Adolescents-Rowman & Littlefield Publishers (2024)
No ratings yet
Jennifer N. Baggerly (Editor) - Contemporary Case Studies in Clinical Mental Health For Children and Adolescents-Rowman & Littlefield Publishers (2024)
333 pages
2019, Pradha - Effective Text Data Preprocessing Technique For Sentiment Analysis in Social Media Data
No ratings yet
2019, Pradha - Effective Text Data Preprocessing Technique For Sentiment Analysis in Social Media Data
8 pages
Concept, Image, and Symbol The Cognitive Basis of Grammar (Ronald W. Langacker) (Z-Library)
100% (1)
Concept, Image, and Symbol The Cognitive Basis of Grammar (Ronald W. Langacker) (Z-Library)
409 pages
Twitter Sentiment Analysis
100% (2)
Twitter Sentiment Analysis
10 pages
Manuscript Preprint
No ratings yet
Manuscript Preprint
30 pages
Twitte Analysis
No ratings yet
Twitte Analysis
53 pages
Principles of Management
No ratings yet
Principles of Management
3 pages
Hardie - The Final Good in Aristotle's Ethics
100% (1)
Hardie - The Final Good in Aristotle's Ethics
20 pages
(1.2425) Decsc 31 082924-1
No ratings yet
(1.2425) Decsc 31 082924-1
22 pages
Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques
No ratings yet
Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques
4 pages
Uno 3
No ratings yet
Uno 3
16 pages
Cuatro
No ratings yet
Cuatro
6 pages
Topic 4
No ratings yet
Topic 4
26 pages
IC-RTETM Final Sentiment Analysis
No ratings yet
IC-RTETM Final Sentiment Analysis
13 pages
Sentiment Analysis On Text Based Data of Social Media Using Deep Learning
No ratings yet
Sentiment Analysis On Text Based Data of Social Media Using Deep Learning
7 pages
Introduction
No ratings yet
Introduction
27 pages
Sentiment Analysis Based On Deep Learning - A Comparative Study
No ratings yet
Sentiment Analysis Based On Deep Learning - A Comparative Study
29 pages
ProjectFinalReport 2copies
No ratings yet
ProjectFinalReport 2copies
26 pages
The Mind and Machine - An Introduction To Artificial Intelligence-2023
No ratings yet
The Mind and Machine - An Introduction To Artificial Intelligence-2023
86 pages
Proposalwriting
No ratings yet
Proposalwriting
16 pages
A Review of Deep Learning Models For Twitter Sentiment Analysis Challenges and Opportunities
No ratings yet
A Review of Deep Learning Models For Twitter Sentiment Analysis Challenges and Opportunities
30 pages
Frontmatter
No ratings yet
Frontmatter
10 pages
Sentiment Analysis Based Twitter Tweets Classification Using Data Embedded With LSTM Technique
No ratings yet
Sentiment Analysis Based Twitter Tweets Classification Using Data Embedded With LSTM Technique
9 pages
Twitter Sentiment Analysis - Final - Report Copy Sahil
No ratings yet
Twitter Sentiment Analysis - Final - Report Copy Sahil
26 pages
Twitter and Emotions: Exploring Sentiment Detection
No ratings yet
Twitter and Emotions: Exploring Sentiment Detection
5 pages
FML Project Report
No ratings yet
FML Project Report
18 pages
Journal T X PDF
No ratings yet
Journal T X PDF
68 pages
Twitter and Emotions: Exploring Sentiment Detection
No ratings yet
Twitter and Emotions: Exploring Sentiment Detection
6 pages
Lessons (Not) Learned: The Troubling Similarities Between Learning Styles and Universal Design For Learning
100% (1)
Lessons (Not) Learned: The Troubling Similarities Between Learning Styles and Universal Design For Learning
35 pages
What Makes The Teacher Great
No ratings yet
What Makes The Teacher Great
16 pages
Sentiment Analysis Using Twitter Data: A Comparative Application of Lexicon and Machine Learning Based Approach
No ratings yet
Sentiment Analysis Using Twitter Data: A Comparative Application of Lexicon and Machine Learning Based Approach
14 pages
Pre Processing
No ratings yet
Pre Processing
9 pages
Journal Labor and Demographic Economics V8 N14 3
No ratings yet
Journal Labor and Demographic Economics V8 N14 3
6 pages
Instructions ISSSD
No ratings yet
Instructions ISSSD
1 page
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Coverage Indexing Journals
No ratings yet
Coverage Indexing Journals
15 pages
Conference Template A4
No ratings yet
Conference Template A4
9 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
11 pages
Research Ashish
No ratings yet
Research Ashish
7 pages
Wang 2016
No ratings yet
Wang 2016
4 pages
Crowd Sourcing Platform IEEE Paper 1
No ratings yet
Crowd Sourcing Platform IEEE Paper 1
7 pages
Fin Ijprems1714118825
No ratings yet
Fin Ijprems1714118825
6 pages
Praveen Phase 3
No ratings yet
Praveen Phase 3
6 pages
Tweet Emotion Detection
No ratings yet
Tweet Emotion Detection
6 pages
Sentiment of Tweets
No ratings yet
Sentiment of Tweets
7 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
7 pages
TSA Synopsis
No ratings yet
TSA Synopsis
18 pages
Twitter Sentiment Analysis System
No ratings yet
Twitter Sentiment Analysis System
5 pages
Bottom-Up Interventions
No ratings yet
Bottom-Up Interventions
1 page
Questionnaire
No ratings yet
Questionnaire
6 pages
Abstract Review PPT Tem - 03
No ratings yet
Abstract Review PPT Tem - 03
7 pages
Machine Learning For Sentiment Analysis of Twitter Data
No ratings yet
Machine Learning For Sentiment Analysis of Twitter Data
9 pages
Social Media Sentiment
No ratings yet
Social Media Sentiment
8 pages
Finalreview 1
No ratings yet
Finalreview 1
4 pages
IJCRT2207068
No ratings yet
IJCRT2207068
5 pages
Ebsco 4
No ratings yet
Ebsco 4
1 page
6 Project Report Sem6
No ratings yet
6 Project Report Sem6
13 pages
Analyzing Consumer Markets
No ratings yet
Analyzing Consumer Markets
30 pages
Se Write-Up
No ratings yet
Se Write-Up
2 pages
A Review On Twitter Sentiment Analysis Approaches
No ratings yet
A Review On Twitter Sentiment Analysis Approaches
5 pages
Lecture 1: Introduction To Cognitive Computing and Deep Learning
No ratings yet
Lecture 1: Introduction To Cognitive Computing and Deep Learning
32 pages
10 1109@icaccs48705 2020 9074208
No ratings yet
10 1109@icaccs48705 2020 9074208
3 pages
Week 3 Lesson Log in EAPP
No ratings yet
Week 3 Lesson Log in EAPP
2 pages
Done Corregido
No ratings yet
Done Corregido
3 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
9 pages
Critique of Aristotle's Episteme
No ratings yet
Critique of Aristotle's Episteme
2 pages
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
No ratings yet
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
3 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
EEL 5 Writing
No ratings yet
EEL 5 Writing
29 pages
1 - English Grade 4 Unit 5 Hot and Cold. Snow and Ice 2 LESSON PLAN
No ratings yet
1 - English Grade 4 Unit 5 Hot and Cold. Snow and Ice 2 LESSON PLAN
2 pages
Techniques For Sentiment Analysis of Twitter Data: A Comprehensive Survey
No ratings yet
Techniques For Sentiment Analysis of Twitter Data: A Comprehensive Survey
7 pages
Sentiment Analysis of User-Generated Twitter Updates Using Various Classification Techniques
No ratings yet
Sentiment Analysis of User-Generated Twitter Updates Using Various Classification Techniques
18 pages
1 Kushartanti
No ratings yet
1 Kushartanti
3 pages
Future Perfect Continuous Tense
No ratings yet
Future Perfect Continuous Tense
8 pages
Machine Learning Based Sentiment Analysis For Text Messages
No ratings yet
Machine Learning Based Sentiment Analysis For Text Messages
7 pages
Large Scale Sentiment Analysis On Twitter With Spark: Nikolaos Nodarakis Spyros Sioutas
No ratings yet
Large Scale Sentiment Analysis On Twitter With Spark: Nikolaos Nodarakis Spyros Sioutas
8 pages
985 Structural Equation Modeling in Educational Research
100% (1)
985 Structural Equation Modeling in Educational Research
39 pages
AICTE Approved Short Term Course On Consumer Behavior: Role of Market Research (Under QIP Sponsored)
No ratings yet
AICTE Approved Short Term Course On Consumer Behavior: Role of Market Research (Under QIP Sponsored)
6 pages
Survey of Deep Learning Approaches For Twitter Text Classification
No ratings yet
Survey of Deep Learning Approaches For Twitter Text Classification
7 pages
Francesco Redi and Controlled Experiments
No ratings yet
Francesco Redi and Controlled Experiments
4 pages
(IJCST-V8I5P3) : Gajendra R. Wani
No ratings yet
(IJCST-V8I5P3) : Gajendra R. Wani
4 pages
Twitter Sentiment Analysis With Textblob
No ratings yet
Twitter Sentiment Analysis With Textblob
6 pages
A Comparison of Preferred Learning Styles Between Vocational and Academic Secondary School Students in Egypt
No ratings yet
A Comparison of Preferred Learning Styles Between Vocational and Academic Secondary School Students in Egypt
9 pages
Sentiment Analysis On Twitter in R
No ratings yet
Sentiment Analysis On Twitter in R
3 pages
Abstract
No ratings yet
Abstract
2 pages
Audio-Lingual Communicative Contrasting
No ratings yet
Audio-Lingual Communicative Contrasting
2 pages
Korean Language Evening Class Syllabus
No ratings yet
Korean Language Evening Class Syllabus
9 pages
Asking For Help Activity
No ratings yet
Asking For Help Activity
5 pages
Senti bp1
No ratings yet
Senti bp1
2 pages
Sentiment Analysis Twitter
No ratings yet
Sentiment Analysis Twitter
3 pages
Computational Thinking: Syllabus
No ratings yet
Computational Thinking: Syllabus
2 pages
Spiritual and Moral Goals
No ratings yet
Spiritual and Moral Goals
6 pages
ECPE Speaking Sample Commentary
100% (1)
ECPE Speaking Sample Commentary
4 pages
The Structure of A Second Conditional Sentence
No ratings yet
The Structure of A Second Conditional Sentence
4 pages
Week 3-4 Gec Chapter 3 PROBLEM SOLVING AND REASONING
No ratings yet
Week 3-4 Gec Chapter 3 PROBLEM SOLVING AND REASONING
7 pages
Pastsimple Usedto Would
No ratings yet
Pastsimple Usedto Would
2 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
From Everand
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
Shane Snipes, PhD
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Twitter and Emotions: Exploring Sentiment Detection

Uploaded by

Twitter and Emotions: Exploring Sentiment Detection

Uploaded by

Twitter and Emotions: Exploring Sentiment

Abstract—Human emotions are often discerned through tone,

Afterward, CNNs were used to compare the results.

The texts were labeled with opinion values: (0) for

This study aimed to investigate sentiment analysis

The detailed results are presented in the following

Fig. 3: Comparison of 5000 data Cross-Validation.

Fig. 1: Comparison of 1000 data Cross-Validation.

Fig 4: Comparison of 5000 data Training and Testing Sets.

Our results demonstrate that sentiment analysis

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.