Skip to main content
Log in

LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs’ performance in the domain of construction and inference. Empirically, our findings suggest that LLMs, represented by GPT-4, are more suited as inference assistants rather than few-shot information extractors. Specifically, while GPT-4 exhibits good performance in tasks related to KG construction, it excels further in reasoning tasks, surpassing fine-tuned models in certain cases. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, leading to the proposition of a Virtual Knowledge Extraction task and the development of the corresponding VINE dataset. Based on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning. We anticipate that this research can provide invaluable insights for future undertakings in the field of knowledge graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data and Materials availability

Our data and materials are accessible in the repository here (The code and datasets are in https://github.com/zjunlp/AutoKG).

Notes

  1. The code and datasets are in https://github.com/zjunlp/AutoKG.

  2. https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/information_extraction/DuIE.

  3. https://www.kdnuggets.com/publications/sheets/ChatGPT_Cheatsheet_Costa.pdf

  4. https://github.com/Significant-Gravitas/Auto-GPT

  5. https://www.nytimes.com/2022/01/31/learning/february-vocabulary-challenge-invent-a-word.html

  6. https://www.nytimes.com/2023/02/01/learning/student-vocabulary-challenge-invent-a-word.html

References

  1. Cai, B., Xiang, Y., Gao, L., Zhang, H., Li, Y., Li, J.: Temporal knowledge graph completion: A survey. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pp. 6545–6553 (2023). https://doi.org/10.24963/IJCAI.2023/734

  2. Zhu, X., Li, Z., Wang, X., Jiang, X., Sun, P., Wang, X., Xiao, Y., Yuan, N.J.: Multi-modal knowledge graph construction and application: A survey. IEEE Trans. Knowl. Data Eng. 36(2), 715–735 (2024). https://doi.org/10.1109/TKDE.2022.3224228

    Article  Google Scholar 

  3. Liang, K., Meng, L., Liu, M., Liu, Y., Tu, W., Wang, S., Zhou, S., Liu, X., Sun, F.: Reasoning over different types of knowledge graphs: Static, temporal and multi-modal. CoRR (2022). https://doi.org/10.48550/arXiv.2212.05767

  4. Chen, X., Zhang, J., Wang, X., Wu, T., Deng, S., Wang, Y., Si, L., Chen, H., Zhang, N.: Continual multimodal knowledge graph construction. CoRR (2023). https://doi.org/10.48550/arXiv.2305.08698

  5. Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: A roadmap. IEEE Trans. Knowl. Data Eng. 36(7), 3580–3599 (2024). https://doi.org/10.1109/TKDE.2024.3352100

    Article  Google Scholar 

  6. Pan, J.Z., Razniewski, S., Kalo, J., Singhania, S., Chen, J., Dietze, S., Jabeen, H., Omeliyanenko, J., Zhang, W., Lissandrini, M., Biswas, R., Melo, G., Bonifati, A., Vakaj, E., Dragoni, M., Graux, D.: Large language models and knowledge graphs: Opportunities and challenges. TGDK 1(1), 2–1238 (2023). https://doi.org/10.4230/.1.1.2

    Article  Google Scholar 

  7. Ye, H., Zhang, N., Chen, H., Chen, H.: Generative knowledge graph construction: A review. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp. 1–17 (2022). https://doi.org/10.18653/v1/2022.emnlp-main.1

  8. Ding, L., Zhou, S., Xiao, J., Han, J.: Automated construction of theme-specific knowledge graphs. CoRR (2024). https://doi.org/10.48550/ARXIV.2404.19146

  9. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional lstm-cnns. Trans. Assoc. Comput. Linguistics 4, 357–370 (2016). https://doi.org/10.1162/tacl_a_00104

    Article  Google Scholar 

  10. Gui, H., Yuan, L., Ye, H., Zhang, N., Sun, M., Liang, L., Chen, H.: Iepile: Unearthing large-scale schema-based information extraction corpus. CoRR (2024). https://doi.org/10.48550/ARXIV.2402.14710

  11. Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pp. 1753–1762 (2015). https://doi.org/10.18653/v1/d15-1203

  12. Chen, X., Zhang, N., Xie, X., Deng, S., Yao, Y., Tan, C., Huang, F., Si, L., Chen, H.: Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In: Laforest, F., Troncy, R., Simperl, E., Agarwal, D., Gionis, A., Herman, I., Médini, L. (eds.) WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, pp. 2778–2788 (2022). https://doi.org/10.1145/3485447.3511998

  13. Chen, Y., Xu, L., Liu, K., Zeng, D., Zhao, J.: Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 167–176 (2015). https://doi.org/10.3115/v1/p15-1017

  14. Deng, S., Zhang, N., Kang, J., Zhang, Y., Zhang, W., Chen, H.: Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (eds.) WSDM ’20: The Thirteenth ACM International Conference on Web Search and Data Mining, Houston, TX, USA, February 3-7, 2020, pp. 151–159 (2020). https://doi.org/10.1145/3336191.3371796

  15. Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015). https://doi.org/10.1109/TKDE.2014.2327028

    Article  Google Scholar 

  16. Zhang, Y., Dai, H., Kozareva, Z., Smola, A.J., Song, L.: Variational reasoning for question answering with knowledge graph. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018 (2018). https://doi.org/10.1609/aaai.v32i1.12057

  17. Rossi, A., Barbosa, D., Firmani, D., Matinata, A., Merialdo, P.: Knowledge graph embedding for link prediction: A comparative analysis. ACM Trans. Knowl. Discov. Data 15(2), 14–11449 (2021). https://doi.org/10.1145/3424672

    Article  Google Scholar 

  18. Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense passage retrieval for open-domain question answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550

  19. Zhu, F., Lei, W., Wang, C., Zheng, J., Poria, S., Chua, T.: Retrieving and reading: A comprehensive survey on open-domain question answering. CoRR (2021)

  20. OpenAI: GPT-4 technical report. CoRR (2023). arxiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774

  21. Liu, A., Hu, X., Wen, L., Yu, P.S.: A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. CoRR (2023). https://doi.org/10.48550/arXiv.2303.13547

  22. Shakarian, P., Koyyalamudi, A., Ngu, N., Mareedu, L.: An independent evaluation of chatgpt on mathematical word problems (MWP). In: Martin, A., Fill, H., Gerber, A., Hinkelmann, K., Lenat, D., Stolle, R., Harmelen, F. (eds.) Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAI-MAKE 2023), Hyatt Regency, San Francisco Airport, California, USA, March 27-29, 2023. CEUR Workshop Proceedings, vol. 3433 (2023)

  23. Lai, V.D., Ngo, N.T., Veyseh, A.P.B., Man, H., Dernoncourt, F., Bui, T., Nguyen, T.H.: Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 (2023). https://doi.org/10.18653/v1/2023.findings-emnlp.878

  24. Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J., Wen, J.: A survey of large language models. CoRR (2023). https://doi.org/10.48550/arXiv.2303.18223

  25. Wei, X., Cui, X., Cheng, N., Wang, X., Zhang, X., Huang, S., Xie, P., Xu, J., Chen, Y., Zhang, M., Jiang, Y., Han, W.: Zero-shot information extraction via chatting with chatgpt. CoRR (2023). arxiv:2302.10205. https://doi.org/10.48550/arXiv.2302.10205

  26. Li, B., Fang, G., Yang, Y., Wang, Q., Ye, W., Zhao, W., Zhang, S.: Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness. CoRR (2023)

  27. Li, G., Wang, P., Ke, W.: Revisiting large language models as zero-shot relation extractors. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 (2023). https://doi.org/10.18653/v1/2023.findings-emnlp.459

  28. Wan, Z., Cheng, F., Mao, Z., Liu, Q., Song, H., Li, J., Kurohashi, S.: GPT-RE: in-context learning for relation extraction using large language models. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 (2023). https://doi.org/10.18653/v1/2023.emnlp-main.214

  29. Qin, C., Zhang, A., Zhang, Z., Chen, J., Yasunaga, M., Yang, D.: Is chatgpt a general-purpose natural language processing task solver? In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 (2023). https://doi.org/10.18653/v1/2023.emnlp-main.85

  30. Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., Zhang, Y.: Evaluating the logical reasoning ability of chatgpt and GPT-4. CoRR (2023). https://doi.org/10.48550/ARXIV.2304.03439

  31. Jiang, J., Zhou, K., Zhao, W.X., Song, Y., Zhu, C., Zhu, H., Wen, J.: Kg-agent: An efficient autonomous agent framework for complex reasoning over knowledge graph. CoRR (2024). https://doi.org/10.48550/ARXIV.2402.11163

  32. Longpre, S., Hou, L., Vu, T., Webson, A., Chung, H.W., Tay, Y., Zhou, D., Le, Q.V., Zoph, B., Wei, J., Roberts, A.: The flan collection: Designing data and methods for effective instruction tuning. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Proceedings of Machine Learning Research (2023)

  33. Christiano, P.F., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA (2017)

  34. Leiter, C., Zhang, R., Chen, Y., Belouadi, J., Larionov, D., Fresen, V., Eger, S.: Chatgpt: A meta-analysis after 2.5 months. CoRR (2023). https://doi.org/10.48550/ARXIV.2302.13795

  35. Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., Zhong, S., Yin, B., Hu, X.B.: Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data 18(6), 160–116032 (2024). https://doi.org/10.1145/3649506

    Article  Google Scholar 

  36. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E.H., Le, Q.V., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 (2022)

  37. Wang, Z., Zhang, G., Yang, K., Shi, N., Zhou, W., Hao, S., Xiong, G., Li, Y., Sim, M.Y., Chen, X., Zhu, Q., Yang, Z., Nik, A., Liu, Q., Lin, C., Wang, S., Liu, R., Chen, W., Xu, K., Liu, D., Guo, Y., Fu, J.: Interactive natural language processing. CoRR (2023). https://doi.org/10.48550/arXiv.2305.13246

  38. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S.M., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of artificial general intelligence: Early experiments with GPT-4. CoRR (2023). https://doi.org/10.48550/ARXIV.2303.12712

  39. Li, S., He, W., Shi, Y., Jiang, W., Liang, H., Jiang, Y., Zhang, Y., Lyu, Y., Zhu, Y.: Duie: A large-scale chinese dataset for information extraction. In: Tang, J., Kan, M., Zhao, D., Li, S., Zan, H. (eds.) Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9-14, 2019, Proceedings, Part II. Lecture Notes in Computer Science, vol. 11839, pp. 791–800 (2019). https://doi.org/10.1007/978-3-030-32236-6_72

  40. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pp. 3219–3232 (2018). https://doi.org/10.18653/v1/d18-1360

  41. Stoica, G., Platanios, E.A., Póczos, B.: Re-tacred: Addressing shortcomings of the TACRED dataset. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 13843–13850 (2021)

  42. Wang, X., Wang, Z., Han, X., Jiang, W., Han, R., Liu, Z., Li, J., Li, P., Lin, Y., Zhou, J.: MAVEN: A massive general domain event detection dataset. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pp. 1652–1671 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.129

  43. Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pp. 1499–1509 (2015). https://doi.org/10.18653/v1/d15-1174

  44. Hwang, J.D., Bhagavatula, C., Bras, R.L., Da, J., Sakaguchi, K., Bosselut, A., Choi, Y.: (comet-) atomic 2020: On symbolic and neural commonsense knowledge graphs. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 6384–6392 (2021)

  45. Jiang, K., Wu, D., Jiang, H.: Freebaseqa: A new factoid QA data set matching trivia-style question-answer pairs with freebase. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 318–323 (2019). https://doi.org/10.18653/v1/n19-1028

  46. Ye, D., Lin, Y., Li, P., Sun, M.: Packed levitated marker for entity and relation extraction. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp. 4904–4917 (2022). https://doi.org/10.18653/v1/2022.acl-long.337

  47. Park, S., Kim, H.: Improving sentence-level relation extraction through curriculum learning. CoRR (2021)

  48. Wang, S., Yu, M., Huang, L.: The art of prompting: Event detection based on type specific prompts. In: Rogers, A., Boyd-Graber, J.L., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp. 1286–1299 (2023). https://doi.org/10.18653/v1/2023.acl-short.111

  49. Wang, X., He, Q., Liang, J., Xiao, Y.: Language models as knowledge embeddings. In: Raedt, L.D. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pp. 2291–2297 (2022). https://doi.org/10.24963/ijcai.2022/318

  50. Hwang, J.D., Bhagavatula, C., Bras, R.L., Da, J., Sakaguchi, K., Bosselut, A., Choi, Y.: (comet-) atomic 2020: On symbolic and neural commonsense knowledge graphs. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 6384–6392 (2021). https://doi.org/10.1609/aaai.v35i7.16792

  51. Yu, D., Zhang, S., Ng, P., Zhu, H., Li, A.H., Wang, J., Hu, Y., Wang, W.Y., Wang, Z., Xiang, B.: Decaf: Joint decoding of answers and logical forms for question answering over knowledge bases. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 (2023)

  52. Madani, N., Joseph, K.: Answering questions over knowledge graphs using logic programming along with language models. In: Maughan, K., Liu, R., Burns, T.F. (eds.) The First Tiny Papers Track at ICLR 2023, Tiny Papers @ ICLR 2023, Kigali, Rwanda, May 5, 2023 (2023)

  53. Gao, J., Zhao, H., Yu, C., Xu, R.: Exploring the feasibility of chatgpt for event extraction. CoRR (2023). https://doi.org/10.48550/arXiv.2303.03836

  54. Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., Sun, X., Xu, J., Li, L., Sui, Z.: A survey for in-context learning. CoRR (2023). https://doi.org/10.48550/ARXIV.2301.00234

  55. Wei, J.W., Wei, J., Tay, Y., Tran, D., Webson, A., Lu, Y., Chen, X., Liu, H., Huang, D., Zhou, D., Ma, T.: Larger language models do in-context learning differently. CoRR (2023)

  56. Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W.X., Wei, Z., Wen, J.: A survey on large language model based autonomous agents. Frontiers Comput. Sci. 18(6), 186345 (2024). https://doi.org/10.1007/S11704-024-40231-1

    Article  Google Scholar 

  57. Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., Zheng, R., Fan, X., Wang, X., Xiong, L., Zhou, Y., Wang, W., Jiang, C., Zou, Y., Liu, X., Yin, Z., Dou, S., Weng, R., Cheng, W., Zhang, Q., Qin, W., Zheng, Y., Qiu, X., Huan, X., Gui, T.: The rise and potential of large language model based agents: A survey. CoRR (2023). https://doi.org/10.48550/arXiv.2309.07864

  58. Zhao, P., Jin, Z., Cheng, N.: An in-depth survey of large language model-based artificial intelligence agents. CoRR (2023). https://doi.org/10.48550/arXiv.2309.14365

  59. Li, G., Hammoud, H.A.A.K., Itani, H., Khizbullin, D., Ghanem, B.: CAMEL: communicative agents for “mind” exploration of large scale language model society. CoRR (2023). https://doi.org/10.48550/arXiv.2303.17760

  60. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

  61. Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E.H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., Fedus, W.: Emergent abilities of large language models. Trans. Mach. Learn. Res. 2022 (2022)

  62. Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q.V., Xu, Y., Fung, P.: A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. In: Park, J.C., Arase, Y., Hu, B., Lu, W., Wijaya, D., Purwarianti, A., Krisnadhi, A.A. (eds.) Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP 2023 -Volume 1: Long Papers, Nusa Dua, Bali, November 1 - 4, 2023, pp. 675–718 (2023). https://doi.org/10.18653/v1/2023.ijcnlp-main.45

  63. Nori, H., King, N., McKinney, S.M., Carignan, D., Horvitz, E.: Capabilities of GPT-4 on medical challenge problems. CoRR (2023). https://doi.org/10.48550/ARXIV.2303.13375

  64. Qiao, S., Ou, Y., Zhang, N., Chen, X., Yao, Y., Deng, S., Tan, C., Huang, F., Chen, H.: Reasoning with language model prompting: A survey. In: Rogers, A., Boyd-Graber, J.L., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp. 5368–5393 (2023). https://doi.org/10.18653/v1/2023.acl-long.294

  65. Sánchez, R.J., Conrads, L., Welke, P., Cvejoski, K., Marin, C.O.: Hidden schema networks. In: Rogers, A., Boyd-Graber, J.L., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp. 4764–4798 (2023). https://doi.org/10.18653/v1/2023.acl-long.263

  66. Ma, Y., Cao, Y., Hong, Y., Sun, A.: Large language model is not a good few-shot information extractor, but a good reranker for hard samples! In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pp. 10572–10601 (2023). https://doi.org/10.18653/v1/2023.findings-emnlp.710

  67. Jeblick, K., Schachtner, B., Dexl, J., Mittermeier, A., Stüber, A.T., Topalis, J., Weber, T., Wesp, P., Sabel, B.O., Ricke, J., Ingrisch, M.: Chatgpt makes medicine easy to swallow: An exploratory case study on simplified radiology reports. CoRR (2022). https://doi.org/10.48550/arXiv.2212.14882

  68. Tan, Y., Min, D., Li, Y., Li, W., Hu, N., Chen, Y., Qi, G.: Evaluation of chatgpt as a question answering system for answering complex questions. CoRR (2023). https://doi.org/10.48550/arXiv.2303.07992

  69. Jiao, W., Wang, W., Huang, J., Wang, X., Tu, Z.: Is chatgpt A good translator? A preliminary study. CoRR (2023). https://doi.org/10.48550/arXiv.2301.08745

  70. Kasai, J., Kasai, Y., Sakaguchi, K., Yamada, Y., Radev, D.: Evaluating GPT-4 and chatgpt on japanese medical licensing examinations. CoRR (2023). https://doi.org/10.48550/ARXIV.2303.18027

  71. Sifatkaur, Singh, M., B, V.S., Malviya, N.: Mind meets machine: Unravelling gpt-4’s cognitive psychology. CoRR (2023). https://doi.org/10.48550/arXiv.2303.11436

  72. Nunes, D., Primi, R., Pires, R., Alencar Lotufo, R., Nogueira, R.F.: Evaluating GPT-3.5 and GPT-4 models on brazilian university admission exams. CoRR (2023). https://doi.org/10.48550/arXiv.2303.17003

  73. Lyu, Q., Tan, J., Zapadka, M.E., Ponnatapuram, J., Niu, C., Wang, G., Whitlow, C.T.: Translating radiology reports into plain language using chatgpt and GPT-4 with prompt learning: Promising results, limitations, and potential. Vis. Comput. Ind. Biomed. Art 6, 9 (2023). https://doi.org/10.1186/s42492-023-00136-5

  74. Li, D., Tan, Z., Chen, T., Liu, H.: Contextualization distillation from large language model for knowledge graph completion. In: Graham, Y., Purver, M. (eds.) Findings of the Association for Computational Linguistics: EACL 2024, St. Julian’s, Malta, March 17-22, 2024, pp. 458–477 (2024). https://aclanthology.org/2024.findings-eacl.32

  75. Li, F., Lin, Z., Zhang, M., Ji, D.: A span-based model for joint overlapped and discontinuous named entity recognition. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pp. 4814–4828 (2021). https://doi.org/10.18653/v1/2021.acl-long.372

  76. Zhou, W., Zhang, S., Gu, Y., Chen, M., Poon, H.: Universalner: Targeted distillation from large language models for open named entity recognition. In: The Twelfth International Conference on Learning Representations, ICLR 2024 (2024). https://openreview.net/forum?id=r65xfUb76p

  77. Jiang, P., Lin, J., Wang, Z., Sun, J., Han, J.: GenRES: Rethinking evaluation for generative relation extraction in the era of large language models. In: Duh, K., Gomez, H., Bethard, S. (eds.) Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 2820–2837. Association for Computational Linguistics, Mexico City, Mexico (2024). https://aclanthology.org/2024.naacl-long.155

  78. Wang, L., Zhao, W., Wei, Z., Liu, J.: Simkgc: Simple contrastive knowledge graph completion with pre-trained language models. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp. 4281–4294 (2022). https://doi.org/10.18653/v1/2022.acl-long.295

  79. Li, D., Zhu, B., Yang, S., Xu, K., Yi, M., He, Y., Wang, H.: Multi-task pre-training language model for semantic network completion. ACM Trans. Asian Low Resour. Lang. Inf. Process. 22(11), 250–125020 (2023). https://doi.org/10.1145/3627704

  80. Shu, D., Chen, T., Jin, M., Zhang, Y., Zhang, C., Du, M., Zhang, Y.: Knowledge graph large language model (KG-LLM) for link prediction. CoRR (2024). https://doi.org/10.48550/ARXIV.2403.07311

  81. Hao, S., Tan, B., Tang, K., Ni, B., Shao, X., Zhang, H., Xing, E.P., Hu, Z.: Bertnet: Harvesting knowledge graphs with arbitrary relations from pretrained language models. In: Rogers, A., Boyd-Graber, J.L., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pp. 5000–5015 (2023). https://doi.org/10.18653/v1/2023.findings-acl.309

  82. Petroni, F., Rocktäschel, T., Riedel, S., Lewis, P.S.H., Bakhtin, A., Wu, Y., Miller, A.H.: Language models as knowledge bases? In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp. 2463–2473 (2019). https://doi.org/10.18653/v1/D19-1250

  83. AlKhamissi, B., Li, M., Celikyilmaz, A., Diab, M.T., Ghazvininejad, M.: A review on language models as knowledge bases. CoRR (2022). https://doi.org/10.48550/ARXIV.2204.06031

  84. West, P., Bhagavatula, C., Hessel, J., Hwang, J.D., Jiang, L., Bras, R.L., Lu, X., Welleck, S., Choi, Y.: Symbolic knowledge distillation: from general language models to commonsense models. In: Carpuat, M., Marneffe, M., Ruíz, I.V.M. (eds.) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pp. 4602–4625 (2022). https://doi.org/10.18653/v1/2022.naacl-main.341

  85. Luo, L., Ju, J., Xiong, B., Li, Y., Haffari, G., Pan, S.: Chatrule: Mining logical rules with large language models for knowledge graph reasoning. CoRR (2023). https://doi.org/10.48550/ARXIV.2309.01538

  86. Miller, A.H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp. 1400–1409 (2016). https://doi.org/10.18653/v1/d16-1147

Download references

Funding

No funding was received to assist with the preparation of our work.

Author information

Authors and Affiliations

Authors

Contributions

Yuqi Zhu, Xiaohan Wang and Jing Chen wrote the main manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Ningyu Zhang.

Ethics declarations

Ethics approval and consent to participate

This work did not involve any human participants, their data, or biological materials, and therefore did not require ethical approval.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Neuro-Symbolic Intelligence: Large Language Model Enabled Knowledge Engineering

Guest Editor: Haofen Wang, Arijit Khan, Jun Liu and Michael Witbrock

Appendices

Appendix A: Related work

1.1 A.1 Large language models

LLMs are pre-trained on substantial amounts of textual data and have become a significant component of contemporary NLP research. Recent advancements in NLP have led to the development of highly capable LLMs, such as GPT-3 [60], ChatGPT, and GPT-4, which exhibit exceptional performance across a diverse array of NLP tasks, including machine translation, text summarization, and question answering. Concurrently, several previous studies have indicated that LLMs can achieve remarkable results in relevant downstream tasks with minimal or even no demonstration in the prompt [25, 61,62,63,64]. Sánchez et al. [65] proposes a novel neural language model that incorporates inductive biases to enforce explicit relational structures. of pretrained language models. This provides further evidence of the robustness and generality of LLMs.

1.2 A.2 ChatGPT & GPT-4

ChatGPT, an advanced LLM developed by OpenAI, is primarily designed for engaging in human-like conversations. During the fine-tuning process, ChatGPT utilizes RLHF [33], thereby enhancing its alignment with human preferences and values.

As a cutting-edge large language model developed by OpenAI, GPT-4 is building upon the successes of its predecessors like GPT-3 and ChatGPT. Trained on an unparalleled scale of computation and data, it exhibits remarkable generalization, inference, and problem-solving capabilities across diverse domains. In addition, as a large-scale multimodal model, GPT-4 is capable of processing both image and text inputs. In general, the public release of GPT-4 offers fresh insights into the future advancement of LLMs and presents novel opportunities and challenges within the realm of NLP.

With the popularity of LLMs, an increasing number of researchers are exploring the specific emergent capabilities and advantages they possess [66]. Bang et al. [62] performs the in-depth analysis of ChatGPT on the multitask, multilingual and multimodal aspects. The findings indicate that ChatGPT excels at zero-shot learning across various tasks, even outperforming fine-tuned models in certain cases. However, it faces challenges when generalized to low-resource languages. Furthermore, in terms of multi-modality, compared to more advanced vision-language models, the capabilities of ChatGPT remain fundamental. Moreover, ChatGPT has garnered considerable attention in other various domains, including information extraction [25, 53], reasoning [29], text summarization [67], question answering [68] and machine translation [69], showcasing its versatility and applicability in the broader field of natural language processing.

While there is a growing body of research on ChatGPT, investigations into GPT-4 continue to be relatively limited. Nori et al. [63] conducts an extensive assessment of GPT-4 on medical competency examinations and benchmark datasets and shows that GPT-4, without any specialized prompt crafting, surpasses the passing score by over 20 points. Kasai et al. [70] also studies GPT-4’s performance on the Japanese national medical licensing examinations. Furthermore, there are some studies on GPT-4 that focus on cognitive psychology [71], academic exams [72], and translation of radiology reports [73].

1.3 A.3 LLMs for KG

Now many studies leverage large language models to facilitate the construction of knowledge graphs[74]. Some of these tasks focus on specific subtasks within KG construction. For instance, LLMs are utilized for named entity recognition and classification [75, 76], leveraging their contextual understanding and linguistic knowledge. Furthermore, LLMs have also demonstrated utility in tasks such as relation extraction [25, 77] and link prediction [78,79,80]. In line with our approach, several studies have explored the use of LLMs as knowledge bases [74, 81,82,83] to support KG construction. For example, some researchers [84] propose a symbolic knowledge distillation framework that extracts symbolic knowledge from LLMs. They first extract commonsense facts from large LLMs like GPT-3, fine-tune smaller student LLMs, and then use these student models to generate KGs. Concurrently, ChatRule [85] uses LLMs to mine logical rules from KGs, addressing computational intensity and scalability issues present in existing methods. ChatRule generates rules with LLMs, integrating the semantic and structural information of KGs, and employs a rule ranking module to evaluate rule quality. These studies highlight the extensive potential of LLMs in KG construction, promoting the automation and intelligent development of this field.

Appendix B: Datasets

Entity, Relation and Event Extraction DuIE2.0 [39] is a substantial Chinese relationship extraction dataset with more than 210,000 sentences and 48 predefined relationship categories. SciERC [40] is a collection of scientific abstracts annotated with seven relations. Re-TACRED [41], an upgraded version of the TACRED dataset, includes over 91,000 sentences across 40 relations. MAVEN [42] a general-domain event extraction benchmark containing 4,480 documents and 168 event types.

Link Prediction FB15K-237 [43] is widely used as a benchmark for assessing the performance of knowledge graph embedding model on link prediction, encompassing 237 relations and 14,541 entities. ATOMIC 2020 [44] serves as a comprehensive commonsense repository with 1.33 million inferential knowledge tuples about entities and events.

Question Answering FreebaseQA [45] is an open-domain QA dataset built on the Freebase knowledge graph, comprising various sourced question-answer pairs. MetaQA [16], expanded from WikiMovies [86], provides a substantial collection of single-hop and multi-hop question-answer pairs, surpassing 400,000 in total.

Appendix C: Data collection of VINE

Using GPT-4 data up to September 2021 as a basis, we select a portion of participants’ responses from two competitions organized by the New York Times as part of our data sources. These competitions include the “February Vocabulary Challenge: Invent a Word”Footnote 5 held in January 2022 and the “Student Vocabulary Challenge: Invent a Word”Footnote 6 conducted in February 2023. Both competitions aim to promote the creation of distinctive and memorable new words that address gaps in the English language.

Our constructed dataset includes 1,400 sentences, 39 novel relations, and 786 unique entities. In the construction process, we ensure that each relation type had a minimum of 10 associated samples to facilitate subsequent experiments. Notably, we find that in the Re-TACRED test set, certain types of relations have fewer than 10 corresponding data instances. To better conduct our experiments, we select sentences of corresponding types from the training set to offset this deficiency.

Appendix D: Prompts for evaluation

Here we list the prompts used in each task during the experiment.

Table 2 Examples of zero-shot and one-shot prompts we used on Relation Extraction
Table 3 Examples of zero-shot and one-shot prompts we used on Event Detection, Link Prediction, and Question Answering
Table 4 Examples of virtual knowledge extraction

Appendix E: Prompts for virtual knowledge extraction

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Wang, X., Chen, J. et al. LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities. World Wide Web 27, 58 (2024). https://doi.org/10.1007/s11280-024-01297-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01297-w

Keywords

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy