0% found this document useful (0 votes)

3 views10 pages

Text Classification

This study explores the application of machine learning techniques, specifically Random Forest and Naïve Bayes algorithms, for text classification using Python programming. The research demonstrates that the Random Forest classifier outperforms Naïve Bayes with an accuracy of 76.5% compared to 70.01%. The methodology includes web scraping to collect a large dataset of product reviews, which are then classified into various sentiment categories.

Uploaded by

anh.dmh7210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views10 pages

Text Classification

Uploaded by

anh.dmh7210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Ozoh LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

APPLICATION OF MACHINE LEARNING TO TEXT CLASSIFICATION

1*Ozoh P., 2Rasheed S., 3Akanbi C., 4Olayiwola M., 5Ibrahim M., 6Kolawole M.,
7Olubusayo O., 8Adigun A.

1,2,3,5,8
Department of ICT, Osun State University, Nigeria
4,6
Department of Mathematical Sciences, Osun State University, Nigeria,
7
Department of Physics, Osun State University, Nigeria

Corresponding Author, email: patrick.ozoh@uniosun.edu.ng: olayiwola.oyedunsi@uniosun.edu.ng

ABSTRACT
The information superhighway provides important principles for giving out information to various
consultations. Organizations depend on knowing customer observations about products and services. Data
can be enormous to process physically. This study investigates a technique applying Python programming to
collect datasets instinctively. The use of machine learning models evolves by applying Random Forest and
Naïve Bayes algorithms. These techniques are applied to the data collected for text classification purposes.
This process distributes data into; positive, negative, slightly negative, slightly positive, or neutral. The results
from the study show the Random Forest classifier is more efficient than the Naïve Bayes algorithm, resulting
in an accuracy rate of 76.5% about Naïve Bayes (70.01%). This technique enables organizations to receive
insights into customer ways of thinking.

Keywords: Text classification, Internet community, Random Forest (RF), Insight, Data scraping.

INTRODUCTION from the study indicates the proposed application

The application of machine learning models to of the deep learning technique yielded an accuracy

analyze articles was discussed by Rejeb et al. of 94.95%, compared to 85.71% for the previous

(2024). The study shows that the ChatGPT is an technique.

important tool for students and educators. The Mupaikwa (2024) proposed in digital libraries.
study indicates ChatGPT's crucial function in
The technique utilized the KNearest neighbor,
improving students' writing duties and enhancing Bayesian networks, fuzzy logic, support vector
an interactive learning community. The study finds machines, clustering, and classification algorithms.
theoretical and practical concerns for applying The paper proposed the training of librarians,
ChatGPT in educational institutions. Choe et al. curriculum reviews, and research on Python-
(2024) investigate conducting measurable learning dependent technology for libraries. Büyükkeçeci &
by introducing SAMA, which integrates Okur (2024) discuss the feature selection technique
classification algorithms and models. When the for selecting features relevant to machine learning
SAMA algorithm is compared with large-scale functions. This study focused on feature selection
learning benchmarks, SAMA produces a reduction and feature selection stability. This technique
in storage capacity. Also, SAMA-based data minimizes dataset size. This plays a role in
optimization produces harmonious enhancements improving the performance of machine learning
in text classification accuracy. Abubakr et al. models. Valtonen et al. (2024) proposed a standard
(2024) present a relative analysis between two
research database of unstructured text and
models for a multi-class classification. The result encountered the representativeness difference

47
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

between collections of preprocessing and UML- into two groups depending on their relative
based algorithms that confront research importance. The results from the study indicate that
undertakings and transparency. The study requires time, cost, responsiveness, and accessibility are
for contextual representations to focus on issues predictors for producing significant user experience
and offer recommendations for addressing on the internet. The recommendations from this
contextual suitability of the UML in research research will improve the quality resulting in more
settings. A review of past research works on text user contentedness. Kariri et al. (2023) examine the
mining was done by Shamshiri et al. (2024). The total study of ANNs and provide directions for
paper investigates the aim of conducting several future research. The research enumerates several
research works having special functions. The articles and various journals using a text-mining
findings from this paper will enumerate important technique. The study indicates that research in
insights, resulting in further progress in computer machine learning is increasing. The study proposed
network research and its connection to academia by Kariri et al. (2023) requires the availability of a
and industry. framework to provide a robust study for ANNs.
Abdusalomovna (2023) presents a framework for
Duan et al. (2024) proposed measuring a data set,
the application to examine unstructured text in
including social media to integrate with the
databases to transform the data into structured data
system’s decision-making process. The system
usable for artificial intelligence (AI) technology.
process will depend on several types of data
collected from elsewhere. The research uses text- METHODOLOGY
mining techniques to process Twitter data. This
This research utilizes web scraping as a method for
paper applies Naïve Bayes, Random Forest, and
data collection, employing a scraper developed in
XGBoost techniques to classify comments on
the Python programming language. This approach
social media. The paper uses the sampling method
is chosen over the conventional method of copy
to compute imbalances in class distribution and
and paste due to its efficiency and time-saving
obtains public opinion about street cleanliness.
capabilities. Web scraping automates the process,
This research can be applied to other social media
enabling the collection of large volumes of data
platforms, including Facebook. The study can
from websites in a matter of minutes, a task that
derive costs and get an understanding of the
would otherwise be tedious and time-consuming.
efficiency of the study. Umer et al. (2023) propose
It's important to note that web scraping is limited to
the CNN model together with text classification.
textual comments and does not include animations
The technique was applied to the classification
or images. The research focuses on gathering data
model to produce a word-embedded model. In
spanning five years of reviews on both perishable
addition, the proposed technique has been applied
and non-perishable food products from Amazon's
on Twitter. The system shows the reliability of the
webpage. A total of 113,683 datasets were
Fast Text word embedded system.
collected using this method. Random Forest and
A practical framework is presented in Pal et al. Naïve Bayes classifiers were selected for analysis,
(2023). The paper finds solutions to challenges in as they are known to perform well with large
research by investigating user comments for some datasets.
websites. The study selects the principal variables
known as predictors and classifies the predictors

48
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

The system architecture is illustrated in Figure 1 with the help of the web scrapping method, which
which provides an overview of the research is later pre-processed (text transformation).
framework. The proposed architecture is collected

Figure 1. System architecture.

Figure 2. Flowchart depicting web process scrapping.

The dataset is divided into train and test datasets. the models. The test set is input into the trained
The train set is input into the algorithm to develop model to predict the results. The output from the

49
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

models is analyzed to investigate their posterior probabilities. The value of class α might
performances. Additionally, the flowchart outlining be positive, slightly negative, negative, or neutral.
the data collection and analysis process is
A review of food products can be considered as a
presented in Figure 2, offering a visual
document. Verzi & Auger (2021) highlighted that
representation of the methodology employed in the
the multinomial model of Naive Bayes effectively
study.
captures word frequency information within
Random forest and Naïve Bayes classifiers documents. The Maximum Likelihood Estimate
(MLE) determines the most likely value for each
The Random Forest classifier selects its output
parameter given the training data, thereby
category based on a majority vote, whereby the
providing a reliable ratio. This approach helps in
most frequently occurring category among the
accurately estimating the parameters based on the
predictions from multiple trees is considered as the
available training data. For the previous likelihood,
final result. This approach ensures robustness and
this estimate is given as:
reliability in classification. Moreover, Random
Forest classifiers are user-friendly, requiring γ(α)=(Nc)/N (2)
minimal expertise and programming skills. They
Where Nc is the total number of documents in class
are accessible to both experts and novices, making
α while N is the total amount of documents. The
them suitable for individuals without an extensive
multinomial model assumes every other given
mathematical background.
value for the actual class independent of attributes
The Naïve Bayes classifier is a method based on value:
Bayes' theorem. It operates under the assumption
γ(β}α) = γ(𝜑1 . . . 𝜑𝑛𝑑 )|𝛼) (3)
that the presence of a particular feature during
classification is independent of the presence of In the multinomial model, a document is structured
other features. This model is particularly as a sequence of word occurrences drawn from the
advantageous for handling very large datasets due same vocabulary, denoted as V. Each document,
to its simplicity and ease of implementation. In denoted as βi, is considered independent of others.
addition to its simplicity, the Naïve Bayes classifier The parameter βi represents the distribution of
is well-suited for problems that involve associating words within each document, following a
objects with discrete categories. It belongs to the multinomial distribution with numerous
group of numerically-based approaches and offers independent trials. This results in the common bag-
several benefits, including simplicity, speed, and of-words (BOW) representation for documents.
high accuracy. Overall, Naïve Bayes classifiers The BOW model is commonly utilized in
provide a straightforward and efficient solution for document classification tasks, where the frequency
a wide range of classification tasks. Spiteri et al. of word occurrences serves as features for training
(2020) describes the Bayes rule as: classifiers. A unigram feature is employed to
indicate the presence of a single word within a text
γ(β) = (γ(α|β))/(γ(α)∗ γ(β|α) (1)
interval. This approach enables the representation
Where α is the specific class, β is the intended of documents based on the occurrence of individual
document to be classified, γ(α) and γ(β) are the words, facilitating effective classification
prior probabilities, γ(α | β) and γ(β | α) are the processes.

50
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

The conditional probability γ(ω | α) is estimated as Table 1: Sample dataset

the relative frequency in term of ω in documents
belonging to class α including multiple occurrences Raining ID Review Sentiment
of a term during a document. Train set 1 Sweet food Positive
Not good As
2 advertised Negative
γ(φ|α) = (count(φ, α) + 1)/(count(α)+|V|) (4)
3 Bad food Negative
Where count (ω, α): Number of occurrences of ω in Test set 4 Bad food Negative
training documents from class α. count(α): Number
of words therein class. Calculate conditional probabilities/maximum
likelihood smoothing (Laplace) Naive Bayes
|V|: Number of terms within the vocabulary in the
estimate by using Equation 5
test set
1
To address the issue of zero probability, the add- γ(bad |positive = (0 + = 0.01235) (8)
2+7

one or Laplace smoothing technique is applied, 1

γ(bad |negative = (1 + = 0.15385) (9)
6+7
which involves adding one to every count. This
1
adjustment ensures that no probability values are γ(food |positive = (1 + = 0.2222) (10)
2+7
zero. Subsequently, the likelihood of a document
1
given its category is calculated using the γ(food }negative = (0 + = 0.0769) (11)
6+7

multinomial distribution, as presented in Equation

Calculate posterior probability
(4). Finally, utilizing posterior probability, the new
document is classified. γ(positive | 1d4) =

Let αNB represent the posterior probability, where 1

∗ 0.02222 = 0.009129 (12)
3
αj is from class α and βi is the ith document. By
calculating the posterior probability based on the γ(negative | 1d4) =

likelihood of the document given its category, 2/3 ∗ 0.222*0.0769=0.0113812 (13)

classification of the new document can be achieved
effectively. γ(negative | 1d4) > γ(positive1d4) (14)

αNB = arg max α j ∈ απi(γ(βi|αj)) (5) γ(negative | 1d4) is the maximum means
probability of negative words in document 4 is
Consider Table 1 as the dataset comprising product
maximum so document 4 is negative.
reviews. The objective of the model is to classify
these reviews into either positive or negative Performance evaluation

categories. Table 1 provides an overview of the In this experiment, performance metrics are
structure of the dataset, serving as the foundation employed for the algorithm's accuracy analysis.
for the classification process. Calculate the prior The proposed system is evaluated using several
probability by using Equation 5 accuracy measures which include: precision, recall,

γ(positive) = 1/3 (6) and F1-score.

γ(negative) = 2/3 (7) 1. Precision: this deals with the ability of the
classifier not to tag a positive sample as

51
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

otherwise. How often the classifier is correct Recall. (To measure the accuracy of classifier
each time it predicts is defined as TP/(TP + FP) for each class over others) as: (2∗precision-
recall) /precision+recall.
2. Recall; deals with finding all positive
instances by the classifier. It is defined as the RESULTS AND DISCUSSION
sum of false negatives and the true positives
The dataset used for the experiments contains
ratio of true positives for each class. TP/ (TP
reviews about perishable and non-perishable food
+ FN)
products from Amazon’s web page with labels;
3. F1-score: It is the average mean of the two Positive / Negative / SlightlyPos / Slightly Neg /
values which we have i.e. Precision and Neutral). The sample dataset is shown in Table 2.

Table 2. Dataset.

ID Review Sentiment
1 Good quakity dog food Positive
12 Not as advertised Negative
23 “Delight” says it all Slightly positive
34 Cough medicine Neutral
45 Great taffy Slightly positive
56 Nice taffy Slightly positive
67 Great! Just as good as the expensive brands! Positive
78 Wonderful, tasty taffy Positive
89 Yay barley Positive
910 Healthy dog food Positive
1011 The best hot sauce in the world Positive
1112 My cats LOVE this !diet! better than theirs Positive
1213 My cats are not fans of the new food Negative
1314 Fresh and greasy ! Slightly positive
1415 Strawbwrry Twizzlers-yummy Positive

Table 3. Description of dataset

Name Variable type Variable Description

ID Input Unique ID of watch review
Review Input Comments about food products from social media pages
Sentiment Output The label associated with each review

The description of the dataset is given in Table 3 performance of each classifier, as outlined in Table
4. The True Positives (TP): tested for Positive &
Experimental results
Review is actually positive. The True Negatives
The experimental results for the two classifiers are (TN): tested for Negative & Review is actually
presented in the form of confusion matrices, negative. The False Positives (FP): tested for
showcasing the counts of true positives, false Positive & Review is not (otherwise known as
negatives, true negatives, and false positives. These “Type I error.”). The False Negatives (FN): tested
matrices offer a comprehensive view of the

52
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

for Negative & Review is not. (Otherwise known

as “Type II error.”)

Table 4. Structure of confusion matrix

Predicted class
True Neg. (TN) False Pos. (FP)
Actual class
False Neg. (FN) True Pos. (TP)

Results for Naïve Bayes’ classifier

The Naïve Bayes algorithm was employed to

Figure 3. Confusion matrix of naïve bayes.
classify the polarity of documents within the
dataset. This algorithm categorizes reviews as The implication of Figure 3 is that for the different

either positive, slightly negative, slightly positive, data samples, the respective values for the positive

neutral, or negative. Upon testing one of the variable are close to the actual values. This

reviews from the dataset, the outcome revealed its signifies that the model is accurate.

polarity classification. Table 5 displays the

experimental results, indicating that 79,658 correct
samples were identified out of 113,683 reviews
using the Naïve Bayes classifier, as determined
from the confusion matrix.

Table 5. Experimental result of Naïve Bayes’

classifier

Total reviews 113,683

Classifier Naive Bayes
Correct sample 79,658
Incorrect sample 34,025

The representations in Table 5 are given as

follows:
Figure 4. Bar chart depicting output from Naïve
correct samples = Summation of all TP values and
Bayes’ classifier.
Incorrect samples = Summation of all FN and FP
Out of the total 113,683 reviews, 79,658 were Results for Random Forest Classifier
correctly classified while 34,025 were incorrectly
The random forest algorithm was employed to
classified. The Naïve Bayes classifier demonstrated
classify the polarity of documents within the
a higher number of correct classifications
dataset. This algorithm categorizes reviews as
compared to the incorrect ones. The results of the
positive, slightly negative, slightly positive,
confusion matrix of the Naïve Bayes classifier are
neutral, or negative. Upon testing one of the
given in Figure 3. Figure 4 shows the bar chart
reviews from the dataset, the outcome revealed its
depicting output from the Naïve Bayes’ classifier

53
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

polarity classification. Table 6 displays the

experimental results, indicating that 86,898 correct
samples were identified out of 113,683 reviews
using the Random Forest algorithm, as derived
from the confusion matrix presented in Figure 5.
Furthermore, Figure 6 illustrates a pie chart
representing the output from the random forest
classifier.

Where correct samples = Summation of all TP

values and
Incorrect samples = Summation of all FN and FP

Figure 6. Pie chart depicting output.

Performance evaluation

Tables 7-8 are individual reports for the techniques

utilized for performance evaluation.

Table 8: Random Forest Classification Report

Precision Recall F1 score

Negative 0.67 0.66 0.66
Figure 5. Confusion matrix of random forest.
Neutral 0.55 0.37 0.44
Table 6. Experimental result. Positive 0.81 0.94 0.87
Slightly neg. 0.61 0.43 0.51
Total reviews 113,683 Slightly pos. 0.63 0.36 0.45
Classifier Random forest Avg. Total 0.74 0.77 0.74
Correct sample 86,898
Incorrect sample 26,788
Discussion
At the end of the experimental analysis, the result
Table 7: Naïve Bayes Classification Report
of Table 6 is obtained with a 70.01% accuracy on
test data. Table 7 has 76.5%; therefore, the best
Precision Recall F1 score
Negative 0.63 0.48 0.55 accuracy was given by Table 7. The percentage
Neutral 0.58 0.06 0.11 accuracy of various classifiers is given in Table 9.
Positive 0.72 0.99 0.83 Accuracy is calculated by:
Slightly neg. 0.59 0.15 0.24
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Slightly pos. 0.51 0.09 0.15 × 100 (6)
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Avg. Total 0.66 0.7 0.63
The implication of results obtained from Equation
(6) is that the model produces more correct

54
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

samples than incorrect samples for the text [2] Choe, S., Mehta, S. V., Ahn, H., Neiswanger,
classifiers. W., Xie, P., Strubell, E., & Xing, E. (2024).
Making scalable meta learning practical.
Table 9. Percentage accuracy of various classifiers
Advances in neural information processing
Performance systems, 36.
Dataset Classifier accuracy of [3] Abubakr, M., Rady, M., Badran, K., &
classifier
Naïve bayes 70.01% Mahfouz, S. Y. (2024). Application of deep
Product
review Random forest 76.50% learning in damage classification of
reinforced concrete bridges. Ain Shams
engineering journal, 15(1), 102297.
CONCLUSIONS
[4] Mupaikwa, E. (2025). The Application of

For many large and mid-sized companies, Artificial Intelligence and Machine Learning

understanding customer sentiments and opinions in Academic Libraries. In Encyclopedia of

regarding their products and services is crucial due Information Science and Technology, Sixth

to the significant impact these sentiments can have Edition (pp. 1-18). IGI Global.

on the company's financial performance. In this [5] BÜYÜKKEÇECİ, M., & OKUR, M. C. (2024).

study, experimental analysis was carried out on a A Comprehensive Review of Feature

dataset comprising product reviews. Both the Selection and Feature Selection Stability in

Naive Bayes classifier and the Random Forest Machine Learning. Gazi University Journal of

classifier were utilized to train the dataset. It was Science, 1-1.

observed that the Random Forest classifier [6] Valtonen, L., Mäkinen, S. J., & Kirjavainen, J.

outperformed the Naive Bayes classifier. Going (2024). Advancing reproducibility and

forward, it is recommended to explore the accountability of unsupervised machine

development of a mobile application or a user- learning in text mining: Importance of

friendly graphical interface. Such tools would transparency in reporting preprocessing and

enable individuals without programming skills to algorithm selection. Organizational Research

easily assess and understand their customers' Methods, 27(1), 88-113.

sentiments towards their products. This approach [7] Shamshiri, A., Ryu, K. R., & Park, J. Y. (2024).

would facilitate broader accessibility and Text mining and natural language processing

utilization of sentiment analysis tools, empowering in construction. Automation in Construction,

companies to make informed decisions based on 158, 105200.
customer feedback. [8] Duan, H. K., Vasarhelyi, M. A., Codesso, M.,
& Alzamil, Z. (2023). Enhancing the
REFERENCES
government accounting information systems
[1] Rejeb, A., Rejeb, K., Appolloni, A., using social media information: An
Treiblmaier, H., & Iranmanesh, M. (2024). application of text mining and machine
Exploring the impact of ChatGPT on learning. International Journal of Accounting
education: A web mining and machine Information Systems, 48, 100600.
learning approach. The International Journal [9] Umer, M., Imtiaz, Z., Ahmad, M., Nappi, M.,
of Management Education, 22(1), 100932. Medaglia, C., Choi, G. S., & Mehmood, A.
(2023). Impact of convolutional neural

55
Ozoh P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

network and FastText embedding on text [13] Spiteri, G., Fielding, J., Diercke, M.,
classification. Multimedia Tools and Campese, C., Enouf, V., Gaymard, A., Bella,
Applications, 82(4), 5569-5585. A., Sognamiglio, P., Moros, M.J.S., Riutort,
[10] Pal, S., Biswas, B., Gupta, R., Kumar, A., & A.N. and Demina, Y.V., 2020. First cases of
Gupta, S. (2023). Exploring the factors that coronavirus disease 2019 (COVID-19) in the
affect user experience in mobile-health WHO European Region, 24 January to 21
applications: A text-mining and machine- February 2020. Eurosurveillance, 25(9),
learning approach. Journal of Business p.2000178
Research, 156, 113484. [14] Spiteri, G., Fielding, J., Diercke, M.,
[11] Kariri, E., Louati, H., Louati, A., & Campese, C., Enouf, V., Gaymard, A., Bella,
Masmoudi, F. (2023). Exploring the A., Sognamiglio, P., Moros, M.J.S., Riutort,
advancements and future research directions A.N. and Demina, Y.V., 2020. First cases of
of artificial neural networks: a text mining coronavirus disease 2019 (COVID-19) in the
approach. Applied Sciences, 13(5), 3186. WHO European Region, 24 January to 21
[12] Abdusalomovna, T. D. (2023). TEXT February 2020. Euro surveillance, 25(9),
MINING. European Journal of p.2000178
Interdisciplinary Research and Development,
13, 284-289.

Paper_2_DK
No ratings yet
Paper_2_DK
20 pages
Efficient String Searching with Boyer-Moore: Definitive Reference for Developers and Engineers
From Everand
Efficient String Searching with Boyer-Moore: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comparison of word embedding features using deep learning in sentiment analysis
No ratings yet
Comparison of word embedding features using deep learning in sentiment analysis
10 pages
Chapter 2 Literature Review
No ratings yet
Chapter 2 Literature Review
44 pages
AI-Powered Inventory Management System: Revolutionizing Stock Monitoring with Real-Time Alerts & Visual Recognition
No ratings yet
AI-Powered Inventory Management System: Revolutionizing Stock Monitoring with Real-Time Alerts & Visual Recognition
12 pages
LIVER_DISEASE_PREDICTION_USING_MACHINE_LEARNING_FINAL
No ratings yet
LIVER_DISEASE_PREDICTION_USING_MACHINE_LEARNING_FINAL
22 pages
Handling imbalanced datasets
No ratings yet
Handling imbalanced datasets
21 pages
Users Sentiments of Chatgpt
No ratings yet
Users Sentiments of Chatgpt
14 pages
A Review of Prompt-Free Few-Shot Text Classification Methods
No ratings yet
A Review of Prompt-Free Few-Shot Text Classification Methods
19 pages
A Review of Prompt-Free Few-Shot Text Classification Methods
No ratings yet
A Review of Prompt-Free Few-Shot Text Classification Methods
19 pages
Final .Ipynb - Colab
No ratings yet
Final .Ipynb - Colab
12 pages
PDC Review2
No ratings yet
PDC Review2
23 pages
Group08_BDM01_Topic-Modelling-in-Text-Classification
No ratings yet
Group08_BDM01_Topic-Modelling-in-Text-Classification
19 pages
Research Article: A Fine-Tuned BERT-Based Transfer Learning Approach For Text Classification
No ratings yet
Research Article: A Fine-Tuned BERT-Based Transfer Learning Approach For Text Classification
17 pages
Chapter2 - Literature Review
No ratings yet
Chapter2 - Literature Review
21 pages
Deep Learning Architectures Enabling Sophisticated Feature Extraction and Representation For Complex Data Analysis
No ratings yet
Deep Learning Architectures Enabling Sophisticated Feature Extraction and Representation For Complex Data Analysis
11 pages
Early Prediction of Maternal Health Risk Factors Using Machine Learning Techniques
No ratings yet
Early Prediction of Maternal Health Risk Factors Using Machine Learning Techniques
6 pages
Hyperparameters Optimization XGBoost For Network Intrusion Detection Using CSE-CIC-IDS 2018 Dataset
No ratings yet
Hyperparameters Optimization XGBoost For Network Intrusion Detection Using CSE-CIC-IDS 2018 Dataset
10 pages
Shrutireport
No ratings yet
Shrutireport
30 pages
Machine Learning For Industry 40 A Systematic Review Using Deep LearningBased Topic ModellingSensors
No ratings yet
Machine Learning For Industry 40 A Systematic Review Using Deep LearningBased Topic ModellingSensors
26 pages
A System To Filter Unwanted Messages From Osn User Walls
100% (1)
A System To Filter Unwanted Messages From Osn User Walls
30 pages
Software Engineering for Data Scientists (MEAP V2) Andrew Treadway download
100% (5)
Software Engineering for Data Scientists (MEAP V2) Andrew Treadway download
84 pages
Keywords::Sentimental Analysis, Naive Bayes, Support Vector Machine
No ratings yet
Keywords::Sentimental Analysis, Naive Bayes, Support Vector Machine
44 pages
IDP FINAL BATCH-16 SEC-G
No ratings yet
IDP FINAL BATCH-16 SEC-G
7 pages
Individual Buffalo Identification Through Muzzle Dermatoglyphics Images Using Deep Learning Approaches
No ratings yet
Individual Buffalo Identification Through Muzzle Dermatoglyphics Images Using Deep Learning Approaches
14 pages
Loan Prediction Project Report
No ratings yet
Loan Prediction Project Report
3 pages
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
No ratings yet
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
8 pages
Analysis of Machine Learning Algorithm With Road Accidents Data Sets
No ratings yet
Analysis of Machine Learning Algorithm With Road Accidents Data Sets
11 pages
Deep Learning
No ratings yet
Deep Learning
42 pages
Cover Page
No ratings yet
Cover Page
11 pages
MLT Unit 5 Partb
No ratings yet
MLT Unit 5 Partb
19 pages
LLM-Powered_Natural_Language_Text_Processing_for_O
No ratings yet
LLM-Powered_Natural_Language_Text_Processing_for_O
14 pages
A Novel Ensemble Approach for Toxic Comment Detection Using Context-Free and Context-Aware Models
No ratings yet
A Novel Ensemble Approach for Toxic Comment Detection Using Context-Free and Context-Aware Models
9 pages
Toxic Comments Classification
No ratings yet
Toxic Comments Classification
10 pages
fin_irjmets1714329036
No ratings yet
fin_irjmets1714329036
6 pages
Conflict of Interest Based Features For Expert Classification in Bibliographic Network (Diana, Chastine)
No ratings yet
Conflict of Interest Based Features For Expert Classification in Bibliographic Network (Diana, Chastine)
6 pages
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
No ratings yet
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
6 pages
User Recommendation System On Text Based Images
No ratings yet
User Recommendation System On Text Based Images
5 pages
Web Based Portal For Complete Data Engineering
No ratings yet
Web Based Portal For Complete Data Engineering
7 pages
Explainable Federated Learning For Botnet Detection in IoT Networks
No ratings yet
Explainable Federated Learning For Botnet Detection in IoT Networks
8 pages
Twitter Sentiment Analysis Using Python TweetX
No ratings yet
Twitter Sentiment Analysis Using Python TweetX
3 pages
An Efficient Book Reading Mechanism Using Deep Learning
No ratings yet
An Efficient Book Reading Mechanism Using Deep Learning
9 pages
User Needs Worksheet
No ratings yet
User Needs Worksheet
10 pages
Model Evaluation Metrics - Interpretation
No ratings yet
Model Evaluation Metrics - Interpretation
1 page
spam detection
No ratings yet
spam detection
39 pages
Covid-Densenet: A Deep Learning Architecture To Detect Covid-19 From Chest Radiology Images
No ratings yet
Covid-Densenet: A Deep Learning Architecture To Detect Covid-19 From Chest Radiology Images
11 pages
28_2023111147110113
No ratings yet
28_2023111147110113
4 pages
Topic 1 Question #61: Correct Answer
No ratings yet
Topic 1 Question #61: Correct Answer
45 pages
U M L T B A S - I BOT: Sing Achine Earning O Uild EMI Ntelligent
No ratings yet
U M L T B A S - I BOT: Sing Achine Earning O Uild EMI Ntelligent
17 pages
English Language Review Using Pattern Recognition and Machine Learning
No ratings yet
English Language Review Using Pattern Recognition and Machine Learning
12 pages
EJMTC1866511614549600
No ratings yet
EJMTC1866511614549600
7 pages
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
An AI-Driven Interactive Chatbot: A Well-Trained Chatbot That Communicates With The Users and Reduces The Manual Interaction
No ratings yet
An AI-Driven Interactive Chatbot: A Well-Trained Chatbot That Communicates With The Users and Reduces The Manual Interaction
8 pages
Review 3 - Journal Submission Format: Team Number Title (New)
No ratings yet
Review 3 - Journal Submission Format: Team Number Title (New)
28 pages
RESTAURANT REVIEW PRODUCTION ANALYSIS USING PYTHON (1)
No ratings yet
RESTAURANT REVIEW PRODUCTION ANALYSIS USING PYTHON (1)
33 pages
Whatsapp Chat Exploratory Data Analysis
No ratings yet
Whatsapp Chat Exploratory Data Analysis
10 pages
Sentimental Analysis of Product Review Data Using Deep Learning
No ratings yet
Sentimental Analysis of Product Review Data Using Deep Learning
5 pages
Science Research Journal
No ratings yet
Science Research Journal
7 pages
One-Class Learning For AI-Generated Essay Detection
No ratings yet
One-Class Learning For AI-Generated Essay Detection
24 pages
Sentiment Analysis of Twitter Data: Sahar A. El - Rahman Feddah Alhumaidi Alotaibi Wejdan Abdullah Alshehri
No ratings yet
Sentiment Analysis of Twitter Data: Sahar A. El - Rahman Feddah Alhumaidi Alotaibi Wejdan Abdullah Alshehri
4 pages
Construction MD Minaz HossainL004170109
No ratings yet
Construction MD Minaz HossainL004170109
24 pages
Sentiment Analysis On Twitter Data
No ratings yet
Sentiment Analysis On Twitter Data
7 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
Mul Tiple Classification System For Fracture Detection in Human Bone X-Ray Images
No ratings yet
Mul Tiple Classification System For Fracture Detection in Human Bone X-Ray Images
8 pages
Covid-19 Risk Assessment Through Multiple Face Mask Detection Using Mobilenetv2 DNN
No ratings yet
Covid-19 Risk Assessment Through Multiple Face Mask Detection Using Mobilenetv2 DNN
6 pages
FYP Proposal
No ratings yet
FYP Proposal
18 pages
Sms Spam Detection Using Machine Learning and Deep Learning Techniques
No ratings yet
Sms Spam Detection Using Machine Learning and Deep Learning Techniques
11 pages
Shruti Black Book Soft
No ratings yet
Shruti Black Book Soft
95 pages
A Proposed Technique For Finding Pattern From Web Usage Data
No ratings yet
A Proposed Technique For Finding Pattern From Web Usage Data
4 pages
Chatbots
No ratings yet
Chatbots
16 pages
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
NCSPCN 12 CRP
No ratings yet
NCSPCN 12 CRP
3 pages
Paper 19-Malicious URL Detection Based On Machine Learning
No ratings yet
Paper 19-Malicious URL Detection Based On Machine Learning
6 pages
Social Media Data Driven Determination of Student Perceptions
No ratings yet
Social Media Data Driven Determination of Student Perceptions
14 pages
GR22
No ratings yet
GR22
8 pages
Insurance Recommendation System
No ratings yet
Insurance Recommendation System
4 pages
Dmgs Dip 17772.edited
No ratings yet
Dmgs Dip 17772.edited
13 pages
Hate Speech Detection Using Machine Learning
No ratings yet
Hate Speech Detection Using Machine Learning
5 pages
Tourism Web App With Aspect Based Sentiment Classification Framework For Tourist Review
No ratings yet
Tourism Web App With Aspect Based Sentiment Classification Framework For Tourist Review
6 pages
Sentiment Analysis PDF
No ratings yet
Sentiment Analysis PDF
4 pages
A Survey On Challenges and Techniques of Sentiment Analysis
No ratings yet
A Survey On Challenges and Techniques of Sentiment Analysis
6 pages
Techniques of Text Classification
No ratings yet
Techniques of Text Classification
28 pages
Literature Review On Application of Natural Language Processing and Machine Learning Techniques For Risk Prediction of Mucormycosis
100% (1)
Literature Review On Application of Natural Language Processing and Machine Learning Techniques For Risk Prediction of Mucormycosis
13 pages
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
No ratings yet
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
6 pages
A Review of Grey Scale Normalization in Machine Learning and Artificial Intelligence For Bioinformatics Using Convolution Neural Networks
No ratings yet
A Review of Grey Scale Normalization in Machine Learning and Artificial Intelligence For Bioinformatics Using Convolution Neural Networks
7 pages
Indexing and Abstracting PDF
86% (7)
Indexing and Abstracting PDF
37 pages
Internship Report On Machine Learning With Python
100% (1)
Internship Report On Machine Learning With Python
50 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Text Classification

Uploaded by

Text Classification

Uploaded by

Ozoh LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

P. et al. /LAUTECH Journal of Engineering and Technology 18 (1) 2024: 47-56

APPLICATION OF MACHINE LEARNING TO TEXT CLASSIFICATION

Corresponding Author, email: patrick.ozoh@uniosun.edu.ng: olayiwola.oyedunsi@uniosun.edu.ng

INTRODUCTION from the study indicates the proposed application

(2024). The study shows that the ChatGPT is an technique.

Figure 1. System architecture.

Figure 2. Flowchart depicting web process scrapping.

The conditional probability γ(ω | α) is estimated as Table 1: Sample dataset

one or Laplace smoothing technique is applied, 1

multinomial distribution, as presented in Equation

Let αNB represent the posterior probability, where 1

likelihood of the document given its category, 2/3 ∗ 0.222*0.0769=0.0113812 (13)

γ(positive) = 1/3 (6) and F1-score.

Table 3. Description of dataset

Name Variable type Variable Description

for Negative & Review is not. (Otherwise known

Table 4. Structure of confusion matrix

Results for Naïve Bayes’ classifier

The Naïve Bayes algorithm was employed to

polarity classification. Table 5 displays the

Table 5. Experimental result of Naïve Bayes’

Total reviews 113,683

The representations in Table 5 are given as

polarity classification. Table 6 displays the

Where correct samples = Summation of all TP

Figure 6. Pie chart depicting output.

Tables 7-8 are individual reports for the techniques

Table 8: Random Forest Classification Report

Precision Recall F1 score

understanding customer sentiments and opinions in Academic Libraries. In Encyclopedia of

study, experimental analysis was carried out on a A Comprehensive Review of Feature

classifier were utilized to train the dataset. It was Science, 1-1.

forward, it is recommended to explore the accountability of unsupervised machine

development of a mobile application or a user- learning in text mining: Importance of

enable individuals without programming skills to algorithm selection. Organizational Research

easily assess and understand their customers' Methods, 27(1), 88-113.

utilization of sentiment analysis tools, empowering in construction. Automation in Construction,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.