Abstract
Knowledge graphs are crucial resources for a large set of document management tasks, such as text retrieval and classification as well as natural language inference. Standard examples are large-scale lexical semantic graphs, such as WordNet, useful for text tagging or sentence disambiguation purposes.
The dynamics of lexical taxonomies is a critical problem as they need to be maintained to follow the language evolution across time. Taxonomy expansion, in this sense, becomes a critical semantic task, as it allows for an extension of existing resources with new properties but also to create new entries, i.e. taxonomy concepts, when necessary. Previous work on this topic suggests the use of neural learning methods able to make use of the underlying taxonomy graph as a source of training evidence. This can be done by graph-based learning, where nets are trained to encode the underlying knowledge graph and to predict appropriate inferences.
This paper presents TaxoSBERT as a simple and effective way to model the taxonomy expansion problem as a retrieval task. It combines a robust semantic similarity measure and taxonomy-driven re-rank strategies. This method is unsupervised, the adopted similarity measures are trained on (large-scale) resources out of a target taxonomy and are extremely efficient. The experimental evaluation with respect to two taxonomies shows surprising results, improving far more complex state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See WordNet for sense 1 of the noun “center field”.
- 2.
The source code is publicly available at https://github.com/crux82/TaxoSBERT.
- 3.
Consider that, if using state-of-the-art Transfomer-based architectures, such as the BERT-based ones, the classification of a text pair requires encoding it and it is a computationally expensive task.
- 4.
Notice that the hyponymy relation does not correspond to a perfect DAG in Wordnet, as multiple inheritances are occasionally needed for some synsets (nodes) in Wordnet. However, for the Taxonomy Enrichment task, this assumption is always satisfied, so that, in the scope of this paper, our definition is thus fully consistent.
- 5.
References
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://doi.org/10.48550/ARXIV.1810.04805, https://arxiv.org/abs/1810.04805
Jiang, M., Song, X., Zhang, J., Han, J.: TaxoEnrich: self-supervised taxonomy completion via structure-semantic representations. In: Proceedings of the ACM Web Conference 2022. ACM (2022). https://doi.org/10.1145/3485447.3511935
Manzoor, E., Li, R., Shrouty, D., Leskovec, J.: Expanding taxonomies with implicit edge semantics. In: Proceedings of The Web Conference 2020, pp. 2044–2054. WWW ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380271
Mao, Y., et al.: Octet: online catalog taxonomy enrichment with self-supervision. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2247–2257. KDD 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394486.3403274
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(1), 39–41 (1995). https://doi.org/10.1145/219717.219748
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2019)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019), https://arxiv.org/abs/1908.10084
Roller, S., Kiela, D., Nickel, M.: Hearst patterns revisited: automatic hypernym detection from large text corpora. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 358–363. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2057, https://aclanthology.org/P18-2057
Shen, J., Shen, Z., Xiong, C., Wang, C., Wang, K., Han, J.: TaxoExpan: self-supervised taxonomy expansion with position-enhanced graph neural network. In: Proceedings of The Web Conference 2020, pp. 486–497. WWW 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380132
Sutskever, I., Salakhutdinov, R., Tenenbaum, J.B.: Modelling relational data using Bayesian clustered tensor factorization. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems, pp. 1821–1828. NIPS 2009, Curran Associates Inc., Red Hook, NY, USA (2009)
Vaswani, A., et al.: Attention is all you need. In: Advances In Neural Information Processing Systems, vol. 30 (2017)
Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1063–1064. WWW 2012 Companion, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2187980.2188242
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics, Online (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Yang, C., Zhang, J., Han, J.: Co-embedding network nodes and hierarchical labels with taxonomy based generative adversarial networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 721–730 (2020). https://doi.org/10.1109/ICDM50108.2020.00081
Yu, Y., Li, Y., Shen, J., Feng, H., Sun, J., Zhang, C.: STEAM: self-supervised taxonomy expansion with mini-paths. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2020). https://doi.org/10.1145/3394486.3403145
Zeng, Q., Lin, J., Yu, W., Cleland-Huang, J., Jiang, M.: Enhancing taxonomy completion with concept generation via fusing relational representations. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery. ACM (2021). https://doi.org/10.1145/3447548.3467308
Zhang, J., et al.: Taxonomy completion via triplet matching network. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February (2021), pp. 4662–4670 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/16596
Acknowledgements
We would like to thank the Istituto di Analisi dei Sistemi ed Informatica - Antonio Ruberti (IASI) for supporting the experimentations through access to dedicated computing resources. We acknowledge financial support from the PNRR MUR project PE0000013-FAIR.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Margiotta, D., Croce, D., Basili, R. (2023). TaxoSBERT: Unsupervised Taxonomy Expansion Through Expressive Semantic Similarity. In: Conte, D., Fred, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2023. Communications in Computer and Information Science, vol 1875. Springer, Cham. https://doi.org/10.1007/978-3-031-39059-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-39059-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39058-6
Online ISBN: 978-3-031-39059-3
eBook Packages: Computer ScienceComputer Science (R0)