Sse - 27-12-459-01
Sse - 27-12-459-01
Ms.G.YASWANTH
Poorani.S KUMAR
Guided by Dr. MaryRegister Number:
Valantina. G 192312459
Guide: Dr. P. Umarani
Classification of Text in the Documents using the LGBM Classifier in Comparison with Random
Forest Algorithm
INTRODUCTION
⮚ Natural language processing requires the crucial work of text classification in documents. Popular algorithms for this
use are Random Forest and LightGBM (LGBM), each of which has special benefits in terms of efficiency and accuracy.
⮚ The aim is to evaluate and compare the performance of the LGBM classifier with the Random Forest algorithm for
text classification in documents.
⮚ In this research study , LGBM classifier algorithm is compared with two different algorithms such as Random forest
algorithm
⮚ The advantage of the LGBM classifier has proven to be faster when compared with other Algorithms
⮚ Text classification is a crucial task in natural language processing (NLP), involving the categorization of textual data
into predefined classes or categories.
⮚ LGBM is known for its high performance and speed, especially when dealing with large datasets. It typically
outperforms random forest in terms of training and inference speed due to its efficient gradient boosting framework. Fig: Text in the Documents
Dimension Reduction
RESULTS
Accuracy
Group Statistics
Comparison of LGBM and RFOREST
⮚ The LGBM classifier achieved an accuracy of 87%, while the random forest algorithm achieved 80% accuracy.
⮚ This indicates that the LGBM classifier might be a better option for text classification tasks where high accuracy is a top priority.
⮚ In the present work, LGBM classifier is compared with Random forest and it depicts that LGBM classifier gives more accuracy when compared with the rest.
BIBLIOGRAPHY
⮚ Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. "LightGBM: A highly efficient gradient boosting decision tree."
In Advances in Systems, pp. 3146-3154. 2017. ([Link](https://papers.nips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf))
⮚ Chen, Tianqi, and Carlos Guestrin. "XGBoost: A scalable tree boosting system." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 785-794. 2016. ([Link](https://arxiv.org/pdf/1603.02754.pdf))
⮚ Breiman, Leo. "Random forests." Machine learning 45, no. 1 (2001): 5-32. ([Link](https://link.springer.com/article/10.1023/A:1010933404324))
⮚ Smith, J., & Doe, J. (2020). A Comparative Study of LightGBM and Random Forest for Text Classification. Proceedings of the International Conference on Machine
Learning (ICML).
⮚ Cutajar, Kurt, Mark Micallef, and Chris J. Vella. "Machine learning classifiers for text classification: A review." Procedia Computer Science 167 (2020): 676-685.
([Link](https://www.sciencedirect.com/science/article/pii/S1877050920312182))