Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion

Lu, Yonghe; Yuan, Meilu; Liu, Jiaxin; Chen, Minghong

doi:10.1007/s11192-022-04566-5

Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion

Published: 03 January 2023

Volume 128, pages 1367–1393, (2023)
Cite this article

Scientometrics Aims and scope Submit manuscript

Yonghe Lu¹,
Meilu Yuan¹,
Jiaxin Liu¹ &
…
Minghong Chen ORCID: orcid.org/0000-0002-3080-9455¹

1076 Accesses
Explore all metrics

Abstract

With the growth in scientific papers, citation recommendation which enables researchers to find useful references efficiently and further to promote academic communication and cooperation has become increasingly important. However, little research has been done to explore how to recognize the semantically relevant references according to research scenarios and the context of the paper citation. Motivated by the research gap, the present study attempts to adopt SciBERT to represent text and expand its semantics through the fusion of WordNet knowledge. Further, core themes from references are automatically extracted by TextRank to solve the problem of incomplete content extraction. In this case, the model named SciBERT + DPCNN is constructed for semantic representation and citation recommendation of scientific papers. Afterwards, multiple experiments are designed and implemented in three parts to verify the effectiveness of the model. The first result is that the outcomes of SciBERT + DPCNN obtain the highest among all baseline models. Additionally, when the model performs in 1 WordNet fusion at the end of the sentence, the best outcomes are 84.72%, 84.80%, 84.72%, and 84.71% in terms of accuracy, precision, recall, and F1-score, respectively. Ultimately, for the classification results of the reference structure, the long text ‘title + abstract + TextRank full text (except the title and abstract)’ outperforms most short text ‘title + abstract’ without WordNet fusion. However, when WordNet is fused for the classification, the short text is mostly more accurate than the long text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Personalized Citation Recommendation Using an Ensemble Model of DSSM and Bibliographic Information

Content-Based Hybrid Deep Neural Network Citation Recommendation Method

Facilitating NeuroIS Research Using Natural Language Processing: Towards Automated Recommendations

Notes

References

Ali, Z., Kefalas, P., Muhammad, K., Ali, B., & Imran, M. (2020a). Deep learning in citation recommendation models survey. Expert Systems with Applications, 162, 113790.
Article Google Scholar
Ali, Z., Qi, G., Kefalas, P., Abro, W. A., & Ali, B. (2020b). A graph-based taxonomy of citation recommendation models. Artificial Intelligence Review, 53(7), 5217–5260.
Article Google Scholar
Ali, Z., Qi, G., Muhammad, K., Ali, B., & Abro, W. A. (2020c). Paper recommendation based on heterogeneous network embedding. Knowledge-Based Systems, 210, 106438.
Article Google Scholar
Ali, Z., Qi, G., Muhammad, K., Kefalas, P., & Khusro, S. (2021a). Global citation recommendation employing generative adversarial network. Expert Systems with Applications, 180, 114888.
Article Google Scholar
Ali, Z., Ullah, I., Khan, A., Ullah Jan, A., & Muhammad, K. (2021b). An overview and evaluation of citation recommendation models. Scientometrics, 126(5), 4083–4119.
Article Google Scholar
Aljohani, N. R., Fayoumi, A., & Hassan, S.-U. (2021). A novel deep neural network-based approach to measure scholarly research dissemination using citations network. Applied Sciences, 11(22), 10970.
Article Google Scholar
Azarafza, M., Feizi-Derakhshi, M.-R., & Shendi, M. B. (2020). Textrank-based microblogs keyword extraction method for persian language. In Conference: 3rd International Congress on Science and Engineering.
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676. [online] Available: arxiv.org/pdf/1903.10676.pdf.
Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150. [online] Available: arxiv.org/pdf/2004.05150.pdf.
Bhagavatula, C., Feldman, S., Power, R., & Ammar, W. (2018). Content-based citation recommendation. arXiv preprint arXiv:1802.08301. [online] Available: arxiv.org/pdf/1802.08301.pdf.
Chen, M., & Gimpel, K. (2020). Learning probabilistic sentence representations from paraphrases. arXiv preprint arXiv:2005.08105. [online] Available: arxiv.org/pdf/2005.08105.pdf.
Chen, H., Meng, R., & Lu, W. (2015). Research review on citation recommendation of academic literatures. Library and Information Service, 59(15), 133–143+147.
Google Scholar
Chen, X., Zhao, H. J., Zhao, S., Chen, J., & Zhang, Y. P. (2019). Citation recommendation based on citation tendency. Scientometrics, 121(2), 937–956.
Article Google Scholar
Cohan, A., Feldman, S., Beltagy, I., Downey, D., & Weld, D. S. (2020). Specter: Document-level representation learning using citation-informed transformers. arXiv preprint. [online Available: arxiv.org/pdf/2004.07180.pdf.
Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for text classification. arXiv preprint arXiv:1606. 01781. [online] Available: arxiv.org/pdf/1606.01781.pdf.
Cui, Z., Peng, L., Xiong, X., & Wang, M. (2021). The investigation of personalized citation recommendation based on the characteristics of activity. Modern Information, 11(05), 134–142.
Google Scholar
Darmawiguna, I. G. M., Pradnyana, G. A., & Jyotisananda, I. B. (2021). Indonesian sentiment summarization for lecturer learning evaluation by using textrank algorithm. Journal of Physics: Conference Series, 1810(1), 012024.
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. [online] Available: arxiv.org/pdf/1810.04805.pdf.
Ding, S., Shang, J., Wang, S., Sun, Y., Tian, H., Wu, H., & Wang, H. (2020). Ernie-doc: A retrospective long-document modeling transformer. arXiv preprint arXiv:2012. 15688. [online] Available: arxiv.org/pdf/2012.15688.pdf.
Elberrichi, Z., Rahmoun, A., & Bentaalah, M. A. (2008). Using WordNet for text categorization. International Arab Journal of Information Technology (IAJIT), 5(1), 16–24.
Google Scholar
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679.
Article Google Scholar
Färber, M., & Jatowt, A. (2020). Citation recommendation: Approaches and datasets. International Journal on Digital Libraries, 21(4), 375–405.
Article Google Scholar
Gunawan, D., Harahap, S. H., & Rahmat, R. F. (2019). Multi-document summarization by using textrank and maximal marginal relevance for text in Bahasa Indonesia. In 2019 International conference on ICT for smart society (ICISS) , 7, pp. 1–5.
Hashimoto, K., & Inoue, U. (2020). Automatic Generation of Structured Abstracts from Research Papers by using Deep Learning. In 2020 9th International Congress on Advanced Applied Informatics (IIAI-AAI) , pp. 424–429.
Huang, Z., Low, C., Teng, M., Zhang, H., Ho, D. E., Krass, M. S., & Grabmair, M. (2021). Context-aware legal citation recommendation using deep learning. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pp. 79–88.
Janz, A., Piasecki, M., & Wątorski, P. (2021). Neural language models vs wordnet-based semantically enriched representation in cst relation recognition. In Proceedings of the 11th global wordnet conference, pp. 223–233.
Jeong, C., Jang, S., Park, E., & Choi, S. (2020). A context-aware citation recommendation model with bert and graph convolutional networks. Scientometrics, 124(3), 1907–1922.
Article Google Scholar
Jia, R., Cao, Y., Fang, F., Li, J., Liu, Y., & Yin, P. (2020). Enhancing pre-trained language representation for multi-task learning of scientific summarization. In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8.
Johnson, R., & Zhang, T. (2017). Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 562–570.
Khor, Y. K., Tan, C. W., & Lim, T. M. (2021). Text summarization on amazon food reviews using textrank. International Conference on Digital Transformation and Applications (ICDXA), 25, 113–120.
Article Google Scholar
Kim, Y. (2014). Convolutional neural networks for sentence classification. Eprint Arxiv.
Koroleva, A., Kamath, S., & Paroubek, P. (2019). Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations. Journal of Biomedical Informatics, 100, 100058.
Article Google Scholar
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.
Li, A., Jiang, T., Wang, Q., & Yu, H. (2016). The mixture of textrank and lexrank techniques of single document automatic summarization research in Tibetan. In 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) , 1, pp. 514–519.
Lim, Y., Seo, D., & Jung, Y. (2020). Fine-tuning BERT models for keyphrase extraction in scientific articles. Journal of Advanced Information Technology and Convergence, 10(1), 45–56.
Article Google Scholar
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101. [online] Available: arxiv.org/pdf/1605.05101.pdf.
Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, Q., Deng, H., & Wang, P. (2020). K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(03), 2901–2908.
Article Google Scholar
Lops, P., Jannach, D., Musto, C., Bogers, T., & Koolen, M. (2019). Trends in content-based recommendation. User Modeling and User-Adapted Interaction, 29(2), 239–249.
Article Google Scholar
Lu, Y., Liu, J., Yuan, M., & Zheng, M. (2021). Citation relationship classification model of scientific papers based on deep learning. Modern Information, 41(03), 29–37.
Google Scholar
Lv, H., & Fu, Y. (2020). Requirement on standard expression of scientific terms: Case study on engineering blasting papers. Acta Editologica, 32(5), 513–517.
Google Scholar
Ma, S., Zhang, H., Zhang, C., & Liu, X. (2021). Chronological citation recommendation with time preference. Scientometrics, 126(4), 2991–3010.
Article Google Scholar
Manjari, K. U. (2020). Extractive summarization of Telugu documents using TextRank algorithm. In 2020 Fourth international conference on I-SMAC (IoT in social, mobile, analytics and cloud)(I-SMAC), pp. 678–683.
Mansuy, T. N., & Hilderman, R. J. (2006). Evaluating wordnet features in text classification models. In Flairs conference, pp. 568–573.
Martin, P. (2009). Towards a collaboratively-built knowledge base of & for scalable knowledge sharing and retrieval (Doctoral dissertation, Université de La Réunion).
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404–411.
Miller, G. A. (1998). WordNet: An electronic lexical database. MIT press.
MATH Google Scholar
Ostendorff, M., Rethmeier, N., Augenstein, I., Gipp, B., & Rehm, G. (2022). Neighborhood contrastive learning for scientific document representations with citation embeddings. arXiv preprint arXiv:2202.06671. [online] Available: arxiv.org/pdf/2202.06671.pdf
Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543.
Piao, G. (2021, May). Scholarly text classification with sentence BERT and entity embeddings. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 79–87.
Pittaras, N., Giannakopoulos, G., Papadakis, G., & Karkaletsis, V. (2021). Text classification with semantically enriched word embeddings. Natural Language Engineering, 27(4), 391–425.
Article Google Scholar
Pornprasit, C., Liu, X., Kiattipadungkul, P., Kertkeidkachorn, N., Kim, K. S., Noraset, T., Hassan, S. U., & Tuarob, S. (2022). Enhancing citation recommendation using citation network embedding. Scientometrics, 127(1), 233–264.
Article Google Scholar
Qiu, W., Shu, Y., & Xu, Y. (2021b). Research on chinese multi-documents automatic summarizations method based on improved textrank algorithm and seq2seq. In Proceedings of the 2021b international conference on bioinformatics and intelligent computing, pp. 196–201.
Qiu, T., Yu, C., Zhong, Y., An, L., & Li, G. (2021a). A scientific citation recommendation model integrating network and text representations. Scientometrics, 126(11), 9199–9221.
Article Google Scholar
Schafer, J. B., Frankowski, D., Herlocker, J., & Sen, S. (2007). Collaborative filtering recommender systems. In The adaptive web, pp. 291–324.
Scott, S., & Matwin, S. (1998). Text classification using WordNet hypernyms. In Usage of WordNet in natural language processing systems, [online] Available: aclanthology.org/W98-0706.pdf.
Strohman, T., Croft, W. B., & Jensen, D. (2007). Recommending citations for academic papers. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 705–706.
Su, M., Su, H., Zheng, H., & Yan, B. (2021). Deep learning for knowledge graph completion with XLNET. In 2021 5th International Conference on Deep Learning Technologies (ICDLT), pp. 13–19.
Tang, J., & Zhang, J. (2009). A discriminative approach to topic-based citation recommendation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 572–579.
Tanner, W., Akbas, E., & Hasan, M. (2019). Paper recommendation based on citation relation. In 2019 ieee international conference on big data (big data), pp. 3053–3059.
Tian, P. (2021). Extracting Measured Properties for Numerical Data with SciBert model and Question Answering (Master’s thesis, University of Twente).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, pp.5999–6009.
Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P., & Yu, P. S. (2019). Heterogeneous graph attention network. In The world wide web conference , pp. 2022–2032.
Wang, H. C., Hsiao, W. C., & Chang, S. H. (2020). Automatic paper writing based on a RNN and the TextRank algorithm. Applied Soft Computing, 97, 106767.
Article Google Scholar
Yang, D., & Zhang, A. N. (2018). Performing literature review using text mining, part iii: Summarizing articles using textrank. In 2018 ieee international conference on big data (big data), pp. 3186–3190.
Yang, L., Zhang, Z., Cai, X., & Dai, T. (2019). Attention-based personalized encoder-decoder model for local citation recommendation. Computational Intelligence and Neuroscience, 2019, 1–7.
Google Scholar
Yao, L., Mao, C., & Luo, Y. (2019). Graph convolutional networks for text classification. In Proceedings of the Aaai Conference on Artificial Intelligence, 33, 7370–7377.
Article Google Scholar
Yu, X., Gu, Q., Zhou, M., & Han, J. (2012). Citation prediction in heterogeneous bibliographic networks. In Proceedings of the 2012 siam international conference on data mining, pp. 1119–1130.
Zaware, S., Patadiya, D., Gaikwad, A., Gulhane, S., & Thakare, A. (2021). Text summarization using tf-idf and textrank algorithm. In 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI) , pp. 1399–1407.
Zha, Y., & Wang, Y. (2021). A citation recommendation model based on bert and gcn. Computer Applications and Software, 38, 1.
Google Scholar
Zhao, W., Ye, J., Yang, M., Lei, Z., Zhang, S., & Zhao, Z. (2018). Investigating capsule networks with dynamic routing for text classification. arXiv preprint arXiv:1804.00538. https://www.arxiv-vanity.com/papers/1804.00538/.
Zhao, W., Yu, Z., & Wu, R. (2021). A citation recommendation method based on context correlation. Intelligent Data Analysis, 25(1), 225–243.
Article Google Scholar

Download references

Acknowledgements

The authors warmly thank reviewers for their valuable suggestions. This research was partly supported by Basic and Applied Basic Research Fund of Guangdong Province (No. 2019B1515120085), the NSFC projects (No. 71603295 and No. 71263006), and Guangdong Natural Science Foundation (No. 2016A030313334).

Funding

Basic and Applied Basic Research Fund of Guangdong Province, (Grant No. 2019B1515120085), Natural Science Foundation of Guangdong Province, (Grant No. 2016A030313334), National Natural Science Foundation of China, (Grant Nos. 71603295, 71263006)

Author information

Authors and Affiliations

School of Information Management, Sun Yat-Sen University, Guangzhou, China
Yonghe Lu, Meilu Yuan, Jiaxin Liu & Minghong Chen

Authors

Yonghe Lu
View author publications
You can also search for this author in PubMed Google Scholar
Meilu Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Minghong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minghong Chen.

Ethics declarations

Conflict of interest

The author declares no confict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, Y., Yuan, M., Liu, J. et al. Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion. Scientometrics 128, 1367–1393 (2023). https://doi.org/10.1007/s11192-022-04566-5

Download citation

Received: 14 February 2022
Accepted: 13 October 2022
Published: 03 January 2023
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11192-022-04566-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Personalized Citation Recommendation Using an Ensemble Model of DSSM and Bibliographic Information

Content-Based Hybrid Deep Neural Network Citation Recommendation Method

Facilitating NeuroIS Research Using Natural Language Processing: Towards Automated Recommendations

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Personalized Citation Recommendation Using an Ensemble Model of DSSM and Bibliographic Information

Content-Based Hybrid Deep Neural Network Citation Recommendation Method

Facilitating NeuroIS Research Using Natural Language Processing: Towards Automated Recommendations

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.