5 LS
5 LS
Abstract— The rapid growth of textual data across various relationships between the words and fails to adapt to the
platforms necessitates efficient real-time text classification dynamic nature of real-time data.
and summarization methods. This paper presents a novel
approach for developing a real-time text classification and The objective of this work is to provide an efficient real-
summarization model that addresses the limitations of time text classification and summarization framework to
traditional methods, such as TFIDF, in extractive extract accurate information in brief for the user.
summarization. The proposed algorithm implemented Therefore, this paper introduced a novel framework which
advanced preprocessing techniques and a similarity matrix deals with the previous limitations of TFIDF by
for enhanced pattern recognition, which leads to more employing an advanced approach for text classification
accurate and contextually relevant summaries. Additionally, and summarization. The proposed model has implemented
the model has integrated logistic regression that ensures the advanced preprocessing techniques, including
precise topic identification. tokenization, stop-word removal, and stemming, to
prepare the text for analysis. It then utilized a self-
Keywords— Real-Time Text Classification, Extractive similarity matrix to enhance pattern recognition, resulting
Summarization, Similarity Matrix, Information Retrieval, Text in more accurate and contextually relevant summaries.
Summarization Model
Unlike conventional methods, this framework also
I. INTRODUCTION integrated a machine learning-based classification system
The exponential growth of textual data has created that ensured precise topic identification, thereby
remarkable challenges for accurate processing of the improving the overall quality of the extracted summaries.
content. From social media posts to academic The paper is organized into five sections. Section I
publications, the need for efficient methods to classify and represents the introduction part. The literature review of
summarize text in real time has become more critical than related research works is represented in Section II through
ever. Traditional approaches, such as the Term Table I. Section III describes the methodologies
Frequency-Inverse Document Frequency (TFIDF), have implemented in this research work. Results are discussed
long been the standard for extractive summarization. and compared with other methodologies in section IV.
However, this method is not able to capture the significant Section V concludes the paper and is followed by
references.
II. LITERATURE REVIEW
TABLE I. SUMMARY OF PREVIOUS WORK
Sl Paper Title Authors and Problem Approach used Result
no. Year Statement
1 A Ranking based Pooja Gupta, It focuses on The authors Pros: The proposed n-gram-based language
Language Model Swati Nigam, extractive text introduce a new model for sentence ranking showed a better
for Automatic Rajiv Singh summarization language model improvement in accuracy for text
Extractive Text (2022) using sentence based on n-grams summarization, achieving 44% for BBC
Summarization ranking. (unigrams, bigrams, News and 36% for CNN datasets.
trigrams) for Cons: The evaluation was conducted on
sentence ranking to only two datasets which may not fully
generate summaries. represent the method’s effectiveness across
diverse text sources. The proposed model
can be computationally intensive, requiring
significant processing power and time.
Authorized licensed use limited to: VTU Consortium. Downloaded on March 17,2025 at 05:22:49 UTC from IEEE Xplore. Restrictions apply.
8 Extractive Text Asha Rani It addresses the The authors employ Pros: Achieved an accuracy of 85% in
Summarization Mishra, V.K challenge of text Machine Learning, summarizing text.
An effective Panchal, summarization LDA, Text Rank,
approach to Pawan Kumar using extractive and Topic Modelling Cons: struggles with highly diverse
extract (2019) methods. to rank and extract datasets, leading to lower recall rates.
information from key sentences.
Text
9 Implementation D.Ganesh, It addresses the The author employs Pros: The algorithm achieved a precision of
of Novel Test Mungara challenge of text machine learning, 0.9557 for ROUGE-1 and ROUGE-3,
Rank Algorithm Kiran Kumar, summarization LDA (Latent indicating accurate summarization.
for Effective Jasti Varsha, using extractive Dirichlet
Text Kimavath methods. Allocation), Text Cons: The recall score for ROUGE-2 was
Summarization Jayanth Naik, Rank, and topic 0.1812, indicating some difficulty in
K.Pranusha modeling. It involves retrieving relevant information for this
(2023) analysis, extraction, metric.
and generation
phases to create
summaries.
10 Query-oriented Mahsa It addresses the The author uses Pros: The proposed method achieved a
Text Afsharizadeh, challenge of methods like TF- higher ROUGE-2 average recall of
Summarization Hossein extracting the most IDF, fuzzy logic, 0.07579 compared to the previous method’s
using Sentence Ebrahimpour- important graph-based 0.06887.
Extraction Komleh information from methods, and Latent
Technique (2018) large volumes of Semantic Analysis Cons: The method still faces challenges in
text efficiently. (LSA) for feature accurately identifying the most informative
extraction and sentences due to the complexity of feature
sentence scoring. extraction.
11 Towards Johannes It addresses the The authors Pros: The MKR framework allows for
Extractive Text Zenkert, challenge integrated named efficient filtering and selection of relevant
Summarization Andr´ e of extractive text entity recognition information, improving the accuracy and
using Klahold and summarization usin (NER), sentiment relevance of summaries
Multidimensiona Madjid g multidimensional analysis (SA), and
l Knowledge Fathi(2018) knowledge topic detection (TD) Cons: The approach requires extensive
Representation representation to create a structured preprocessing and integration of multiple
(MKR) to handle knowledge base for text mining methods, which can be
the complexity of summarization. computationally intensive.
natural language
and the vast amount
of information
available online.
12 Query-oriented Mahsa It addresses the Key techniques Pros: The method was evaluated using
Text Afsharizadeh, challenge of include text pre- the DUC 2007 corpus and showed
Summarization Hossein summarizing large processing improved performance in ROUGE metrics
using Sentence Ebrahimpour- volumes of text (tokenization, stop compared to previous methods.
Extraction Komleh,Ayou quickly and words removal,
Technique b Bagheri effectively, stemming, POS Cons: It requires significant computational
(2018) focusing on query- tagging), feature resources and is complex to implement.
oriented text extraction, and
summarization. sentence scoring.
Authorized licensed use limited to: VTU Consortium. Downloaded on March 17,2025 at 05:22:49 UTC from IEEE Xplore. Restrictions apply.
In this distribution, we can observe self-similarity because To calculate the fractal dimension, we can use the box-
the pattern of decreasing frequencies is repeated at counting dimension method. For example: Suppose we
different scales: have a word frequency distribution with the following
values:
• The top 10 words have a similar pattern of decreasing TABLE II: CALCULATION OF THE FRACTAL DIMENSION
frequencies as the top 100 words. Word Frequency
• The top 100 words have a similar pattern of decreasing the 10
frequencies as the top 1000 words. and 8
B. Fractional Dimension a 6
of 5
The fractional dimension is a measure of the self-
to 4
similarity of a distribution. In the context of word
... ...
frequency distribution, the fractional dimension (D) is a
value between 0 and 1 that quantifies the degree of self-
similarity. A high fractional dimension (D → 1) indicates To calculate the fractal dimension, we can follow these
steps:
a high degree of self-similarity, meaning that the
Step 1: Divide the frequency range into boxes of size ε
distribution has a strong repeating pattern at different
(e.g., ε = 1, 2, 4, 8, ...).
scales. This is often observed in natural language texts, Step 2: Count the number of boxes that contain at least
where the frequency distribution of words follows a one word frequency (N(ε)).
power-law distribution. A low fractional dimension (D → Step 3: Plot log(N(ε)) against log(1/ε) and fit a straight
0) indicates a low degree of self-similarity, meaning that line. The slope of the line is the fractional dimension (D).
the distribution has a more random or uniform pattern.
TABLE III: EXAMPLE OF THE PLOT
C. Procedure for Capturing Self-Similarity from
Fractional Dimension ε N(ε) log(N(ε)) log(1/ε)
Authorized licensed use limited to: VTU Consortium. Downloaded on March 17,2025 at 05:22:49 UTC from IEEE Xplore. Restrictions apply.
making them more representative of the text's intrinsic
structure. Finally, the cosine similarity is employed to
measure the effectiveness of the enhanced summarization
against the original text.
Authorized licensed use limited to: VTU Consortium. Downloaded on March 17,2025 at 05:22:49 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
[1] P. Gupta, S. Nigam and R. Singh, "A Ranking based Language [8] D. Ganesh, M. K. Kumar, J. Varsha, K. J. Naik, K. Pranusha and J.
Model for Automatic Extractive Text Summarization," 2022 First Mallika, "Implementation of Novel Test Rank Algorithm for
International Conference on Artificial Intelligence Trends and Effective Text Summarization," 2023 International Conference on
Pattern Recognition (ICAITPR), Hyderabad, India, 2022, pp. 1-5, Advances in Computing, Communication and Applied Informatics
doi: 10.1109/ICAITPR51569.2022.9844187. (ACCAI), Chennai, India, 2023, pp. 1-6, doi:
[2] K. Ramani, K. Bhavana, A. Akshaya, K. S. Harshita, C. R. Thoran 10.1109/ACCAI58221.2023.10201008.
Kumar and M. Srikanth, "An Explorative Study on Extractive Text [9] M. Afsharizadeh, H. Ebrahimpour-Komleh and A. Bagheri,
Summarization through k-means, LSA, and TextRank," 2023 "Query-oriented text summarization using sentence extraction
International Conference on Wireless Communications Signal technique," 2018 4th International Conference on Web Research
Processing and Networking (WiSPNET), Chennai, India, 2023, pp. (ICWR), Tehran, Iran, 2018, pp. 128-132, doi:
1-6, doi: 10.1109/WiSPNET57748.2023.10134303. 10.1109/ICWR.2018.8387248.
[3] Jain, D., Borah, M. D., & Biswas, A. (2021). Summarization of [10] J. Zenkert, A. Klahold and M. Fathi, "Towards Extractive Text
legal documents: Where are we now and the way forward. Summarization Using Multidimensional Knowledge
Computer Science Review, 40, 100388. Representation," 2018 IEEE International Conference on
https://doi.org/10.1016/j.cosrev.2021.100388 Electro/Information Technology (EIT), Rochester, MI, USA, 2018,
[4] N. Pandey, S. Kumar, V. Ranjan, M. Ahamed and A. K. Sahoo, pp. 0826-0831, doi: 10.1109/EIT.2018.8500186.
"Analyzing Extractive Text Summarization Techniques and [11] Quillo-Espino, J.,Romero-González, R.M. and Herrera-Navarro,
Classification Algorithms: A Comparative Study," 2024 A.-M. (2021) A Deep Look into Extractive Text Summarization.
International Conference on Advancements in Smart, Secure and Journal of Computer and Communications , 9, pp.24-37,
Intelligent Computing (ASSIC), Bhubaneswar, India, 2024, pp. 1-5, doi: 10.4236/jcc.2021.96002.
doi: 10.1109/ASSIC60049.2024.10508020. [12] Geetha C Megharaj, Varsha Jituri, 2022, TFIDF Model based Text
[5] T. Islam, M. Hossain and M. F. Arefin, "Comparative Analysis of Summerization, International Journal Of Engineering Research &
Different Text Summarization Techniques Using Enhanced Technology (Ijert) Rtcsit – 2022 (Volume 10 – Issue 12)
Tokenization," 2021 3rd International Conference on Sustainable [13] R. Haruna, A. Obiniyi, M. Abdulkarim and A. A. Afolorunsho,
Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 2021, pp. "Automatic Summarization of Scientific Documents Using
1-6, doi: 10.1109/STI53101.2021.9732589. Transformer Architectures: A Review," 2022 5th Information
[6] M. Majeed and K. M. T, "Comparative Study on Extractive Technology for Education and Development (ITED), Abuja,
Summarization Using Sentence Ranking Algorithm and Text Nigeria, 2022, pp. 1-6, doi: 10.1109/ITED56637.2022.10051602.
Ranking Algorithm," 2023 International Conference on Power, [14] Watanangura, P., Vanichrudee, S., Minteer, O. et al. A
Instrumentation, Control and Computing (PICC), Thrissur, India, Comparative Survey of Text Summarization Techniques. SN
2023, pp. 1-5, doi: 10.1109/PICC57976.2023.10142314. COMPUT. SCI. 5, 47 (2024). https://doi.org/10.1007/s42979-023-
[7] S. Jugran, A. Kumar, B. S. Tyagi And V. Anand, "Extractive 02343-6
Automatic Text Summarization using SpaCy in Python &
NLP," 2021 International Conference on Advance Computing and
Innovative Technologies in Engineering (ICACITE), Greater
Noida, India, 2021, pp. 582-585, doi:
10.1109/ICACITE51222.2021.9404712.
Authorized licensed use limited to: VTU Consortium. Downloaded on March 17,2025 at 05:22:49 UTC from IEEE Xplore. Restrictions apply.