Abstract
The difficulty of deriving value out of vast available scientific literature in a condensed form lead us to look for a proficient theme based summarization solution which can preserve precise biomedical content. The study targets to analyze impact of combining semantic biomedical concepts extraction, frequent item-set mining and clustering techniques over information retention, objective functions and ROUGE values for the obtained final summary. The suggested frequent item-set mining and clustering (FRI-CL) graph-based framework uses UMLS metathesarus and BERT-based semantic embeddings to identify domain-relevant concepts. The scrutinized concepts are mined according to their relationship with neighbors and frequency via an amended FP-Growth model. The framework utilizes S-DPMM clustering, which is a probabilistic mixture model and aids in the identification and clubbing of complex relevant patterns to increase coverage of important sub-themes. The sentences with the frequent concepts are scored via PageRank to form an efficient and compelling summary. The research experiments on the 100 sample biomedical documents taken from PubMed archives are evaluated via calculation of ROUGE scores, coverage, readability, non-redundancy, memory utilization and information retention from the summary output. The results with the FRI-CL summarization system showcased 10% ROUGE performance improvement and are at par with the other baseline methods. On an average 30–40% improvement in memory utilization is observed with up to 50% information retention when experiments are performed using S-DPMM clustering. The research indicates that the fusion of semantic mapping, clustering, along with frequent-item set mining of biomedical concepts enhance the overall co-related information covering all sub-themes.






Similar content being viewed by others
Data availability
If any request we can provide data.
References
PubMed (2017) https://ncbi.nlm.nih.gov/pubmed/ Accessed 12 Apr 2017
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J et al (2014) Textsummarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467
Afantenos S, Karkaletsis V, Stamatopoulos P (2005) Summarization from medicaldocuments: a survey. Artif Intell Med 33(2):157–177
Fleuren WWM, Alkema W (2015) Application of text mining in the biomedical domain. Methods 74:97–106
Jones KS (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Yao J-G, Wan X, Xiao J (2017) Recent advances in document summarization. Knowl Inform Syst 53(2):297–336
Reeve L, Han H, Brooks AD (2006) BioChain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on applied computing. ACM, pp 180–184
Reeve LH, Han H, Brooks AD (2007) The use of domain-specific concepts in biomedical text summarization. Inf Process Manag 43(6):1765–1776
Plaza L, Díaz A, Gervás P (2011) A semantic graph-based approach to biomedical summarization. Artif Intell Med 53(1):1–14
Davoodijam E, Ghadiri N, LotfiShahreza M, Rinaldi F (2021) MultiGBS: a multi-layer graph approach to biomedical summarization. J Biomed Inf 116:103706
Agrawal R, Imielinski T (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216
Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-09549-3
Nelson SJ, Powell T, Humphreys BL (2002) The unified medical language system (UMLSs) project, in encyclopedia of library and information science, 3rd edn. CRC Press, Florida
LinCY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of workshop on text summarization branches out. Post-conference workshop of ACL. pp 74–81
Hovy E (2005) Automated text summarization. The Oxford handbook of computational linguistics. Oxford University Press, Oxford, pp 583–598
Wafaa S, El-KassasaCherif R, Salamaab Ahmed A, RafeabHoda A, Mohameda K (2021) Automatic text summarization: a comprehensive survey. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113679
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41
Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC (2009) Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 42(5):801–813
Ding D, Karabatsos G (2021) Process mixture models with shrinkage prior. Stat. https://doi.org/10.1002/sta4.3
Brandow R, Mitze K, Rau LF (1995) Automatic condensation of electronic publicationsby sentence selection. Inf Process Manag 31(5):675–685
Anton H (1994) Elementary linear algebra. Wiley, New Jersey
Jaccard P (1901) Etude de la distribution floraledansune portion des Alpes et du Jura. Bull Soc Vaud Des Sci Nat 37:547–579
Singhal A (2001) Modern information retrieval: a brief overview. IEEE Comput Soc Tech Comm Data Eng 24:35–42
Radev DR, Jing H, Budzikowska M (2000) Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluation, and user studies. Inf Process Manag 40(10):919–938
Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28:11–21
Erkan G, Radev DR (2004) Lexrank: Graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, California
Mihalcea R, Tarau P (2004) TextRank: bringing order into texts, proceedings of EMNLP, vol 85. pp 404–411
Baralis E,Cagliero L, Jabeen S, Fiori A (2012) Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th annual ACM Symposium on Applied Computing, pp 782–786
Baralis E, Cagliero L, Fiori A, Garza P (2015) MWI-Sum: a multilingual summarizer based on frequent weighted item sets. ACM Trans Inf Syst 34:1–35
Qiang JP, Chen P, Ding W, Xie F, Wu X (2016) Multi-document summarization using closed patterns. Knowl-Based Syst 99:28–38
Dzuganova B (2013) English medical terminology–different ways of forming medical terms. JAHR Eur J Bioeth 4:55–69
Moradi M, Ghadiri N (2017) Quantifying the informativeness for biomedical literature summarization: an item-set mining method. Comput Methods Program Biomed 146:77–89
Shortliffe EH, Cimino JJ (2014) Biomedical informatics: computer applications in health care and biomedicine, 4th ed. Springer, London
Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modelling. Expert Syst Appl 172:114652
National B, Us M (2009) UMLS Rreference manual. Health (San Francisco)
Ordonez C, Ezquerra N, Santana CA (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):1–2
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(90001):D267–D270
Plaza L, Carrillo-de-Albornoz J (2013) Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization. BMC Bioinform 14(1):71
Nigam K, McCullam A, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled document using em. Mach Learn 39(2/3):103–134
Jones KS, Galliers JR (1996) evaluating natural language processing systems: an analysis and review, vol 228. Springer, New York
Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GraphSum: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109
SweSum (2017) Automatic text summarizer. http://swesum.nada.kth.se/index-engadv. Accessed 15 Mar 2017
Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization an itemset mining and sentence clustering approach. J Biomed Inf 84:1532–2464
Rouane O, Belhadef H, Bouakkaz M (2019) Combine clustering and frequent itemsets mining to enhance biomedical text summarization. Expert Syst Appl J 135:362–373
Moradi M (2018) CIBS a biomedical text summarizer using topic-based sentence clustering. J Biomed Inf 88:53–61
Janaki Raman K, Meenakshi K (2021) Automatic text summarization of article (NEWS) using lexical chains and wordnet—a review. Artif Intell Tech Adv Comput Appl. https://doi.org/10.1007/978-981-15-5329-5_26
Acknowledgements
No Acknowledgements
Funding
No funding is involved in this work.
Author information
Authors and Affiliations
Contributions
There is no authorship contribution.
Corresponding author
Ethics declarations
Conflict of interest
Conflict of interest is not applicable in this work.
Ethical approval
No participation of humans takes place in this implementation process.
Human and animal rights
No violation of Human and Animal Rights is involved.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gupta, S., Sharaff, A. & Nagwani, N.K. Frequent item-set mining and clustering based ranked biomedical text summarization. J Supercomput 79, 139–159 (2023). https://doi.org/10.1007/s11227-022-04578-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04578-1