0% found this document useful (0 votes)
36 views1 page

Sse - 27-12-459-01

This document compares the performance of the LightGBM classifier and Random Forest algorithm for text classification in documents. The LightGBM classifier achieved an accuracy of 87%, while random forest achieved 80% accuracy, indicating LightGBM may be better for text classification tasks that prioritize high accuracy. Based on statistical analysis, there is a significant difference between the two algorithms, with LightGBM attaining higher accuracy for text document classification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views1 page

Sse - 27-12-459-01

This document compares the performance of the LightGBM classifier and Random Forest algorithm for text classification in documents. The LightGBM classifier achieved an accuracy of 87%, while random forest achieved 80% accuracy, indicating LightGBM may be better for text classification tasks that prioritize high accuracy. Based on statistical analysis, there is a significant difference between the two algorithms, with LightGBM attaining higher accuracy for text document classification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Name:

Ms.G.YASWANTH
Poorani.S KUMAR
Guided by Dr. MaryRegister Number:
Valantina. G 192312459
Guide: Dr. P. Umarani

Classification of Text in the Documents using the LGBM Classifier in Comparison with Random
Forest Algorithm
INTRODUCTION

⮚ Natural language processing requires the crucial work of text classification in documents. Popular algorithms for this
use are Random Forest and LightGBM (LGBM), each of which has special benefits in terms of efficiency and accuracy.
⮚ The aim is to evaluate and compare the performance of the LGBM classifier with the Random Forest algorithm for
text classification in documents.
⮚ In this research study , LGBM classifier algorithm is compared with two different algorithms such as Random forest
algorithm
⮚ The advantage of the LGBM classifier has proven to be faster when compared with other Algorithms
⮚ Text classification is a crucial task in natural language processing (NLP), involving the categorization of textual data
into predefined classes or categories.
⮚ LGBM is known for its high performance and speed, especially when dealing with large datasets. It typically
outperforms random forest in terms of training and inference speed due to its efficient gradient boosting framework. Fig: Text in the Documents

MATERIALS AND METHODS

Dimension Reduction

High For Product Text


Data Collection / Feature
Pre-Processing Dimension Classifire Section Classification in document
Extraction

Classification of text in the documents

RESULTS

Group N MEAN Std Deviation Std Error Mean

LGBM 20 87.5 5.91608 1.32288

Accuracy

RFOREST 20 80.65 3.54334 0.79232

Group Statistics
Comparison of LGBM and RFOREST
⮚ The LGBM classifier achieved an accuracy of 87%, while the random forest algorithm achieved 80% accuracy.
⮚ This indicates that the LGBM classifier might be a better option for text classification tasks where high accuracy is a top priority.
⮚ In the present work, LGBM classifier is compared with Random forest and it depicts that LGBM classifier gives more accuracy when compared with the rest.

DISCUSSION AND CONCLUSION


⮚ Based on T-test Statistical analysis, the significance value of p=0.001 (independent sample T-test p<0.05) is obtained and shows that there is a statistical significant
difference between group 1 and group 2.
⮚ Overall, the accuracy of the Classifier is 87.5 % and it is better than the other algorithms.
⮚ From the work, it is concluded that the LGBM Classifier algorithm attains high accuracy when compared with other Machine Learning Algorithms in the classification
of text documents.
⮚ Random forest, on the other hand, may be slower, especially with a large number of trees in the forest, as it builds each tree independently.
⮚ LGBM is known for its high performance and speed, especially when dealing with large datasets.
⮚ Random forest is an ensemble learning method that constructs multiple decision trees and combines their predictions for accurate classification.

BIBLIOGRAPHY
⮚ Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. "LightGBM: A highly efficient gradient boosting decision tree."
In Advances in Systems, pp. 3146-3154. 2017. ([Link](https://papers.nips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf))
⮚ Chen, Tianqi, and Carlos Guestrin. "XGBoost: A scalable tree boosting system." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 785-794. 2016. ([Link](https://arxiv.org/pdf/1603.02754.pdf))
⮚ Breiman, Leo. "Random forests." Machine learning 45, no. 1 (2001): 5-32. ([Link](https://link.springer.com/article/10.1023/A:1010933404324))
⮚ Smith, J., & Doe, J. (2020). A Comparative Study of LightGBM and Random Forest for Text Classification. Proceedings of the International Conference on Machine
Learning (ICML).
⮚ Cutajar, Kurt, Mark Micallef, and Chris J. Vella. "Machine learning classifiers for text classification: A review." Procedia Computer Science 167 (2020): 676-685.
([Link](https://www.sciencedirect.com/science/article/pii/S1877050920312182))

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy