Skip to main content

TaxoSBERT: Unsupervised Taxonomy Expansion Through Expressive Semantic Similarity

  • Conference paper
  • First Online:
Deep Learning Theory and Applications (DeLTA 2023)

Abstract

Knowledge graphs are crucial resources for a large set of document management tasks, such as text retrieval and classification as well as natural language inference. Standard examples are large-scale lexical semantic graphs, such as WordNet, useful for text tagging or sentence disambiguation purposes.

The dynamics of lexical taxonomies is a critical problem as they need to be maintained to follow the language evolution across time. Taxonomy expansion, in this sense, becomes a critical semantic task, as it allows for an extension of existing resources with new properties but also to create new entries, i.e. taxonomy concepts, when necessary. Previous work on this topic suggests the use of neural learning methods able to make use of the underlying taxonomy graph as a source of training evidence. This can be done by graph-based learning, where nets are trained to encode the underlying knowledge graph and to predict appropriate inferences.

This paper presents TaxoSBERT as a simple and effective way to model the taxonomy expansion problem as a retrieval task. It combines a robust semantic similarity measure and taxonomy-driven re-rank strategies. This method is unsupervised, the adopted similarity measures are trained on (large-scale) resources out of a target taxonomy and are extremely efficient. The experimental evaluation with respect to two taxonomies shows surprising results, improving far more complex state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See WordNet for sense 1 of the noun “center field”.

  2. 2.

    The source code is publicly available at https://github.com/crux82/TaxoSBERT.

  3. 3.

    Consider that, if using state-of-the-art Transfomer-based architectures, such as the BERT-based ones, the classification of a text pair requires encoding it and it is a computationally expensive task.

  4. 4.

    Notice that the hyponymy relation does not correspond to a perfect DAG in Wordnet, as multiple inheritances are occasionally needed for some synsets (nodes) in Wordnet. However, for the Taxonomy Enrichment task, this assumption is always satisfied, so that, in the scope of this paper, our definition is thus fully consistent.

  5. 5.

    https://huggingface.co/sentence-transformers/all-mpnet-base-v2.

References

  1. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text (2019)

    Google Scholar 

  2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://doi.org/10.48550/ARXIV.1810.04805, https://arxiv.org/abs/1810.04805

  3. Jiang, M., Song, X., Zhang, J., Han, J.: TaxoEnrich: self-supervised taxonomy completion via structure-semantic representations. In: Proceedings of the ACM Web Conference 2022. ACM (2022). https://doi.org/10.1145/3485447.3511935

  4. Manzoor, E., Li, R., Shrouty, D., Leskovec, J.: Expanding taxonomies with implicit edge semantics. In: Proceedings of The Web Conference 2020, pp. 2044–2054. WWW ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380271

  5. Mao, Y., et al.: Octet: online catalog taxonomy enrichment with self-supervision. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2247–2257. KDD 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394486.3403274

  6. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(1), 39–41 (1995). https://doi.org/10.1145/219717.219748

    Article  Google Scholar 

  7. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2019)

    Google Scholar 

  8. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019), https://arxiv.org/abs/1908.10084

  9. Roller, S., Kiela, D., Nickel, M.: Hearst patterns revisited: automatic hypernym detection from large text corpora. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 358–363. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2057, https://aclanthology.org/P18-2057

  10. Shen, J., Shen, Z., Xiong, C., Wang, C., Wang, K., Han, J.: TaxoExpan: self-supervised taxonomy expansion with position-enhanced graph neural network. In: Proceedings of The Web Conference 2020, pp. 486–497. WWW 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380132

  11. Sutskever, I., Salakhutdinov, R., Tenenbaum, J.B.: Modelling relational data using Bayesian clustered tensor factorization. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems, pp. 1821–1828. NIPS 2009, Curran Associates Inc., Red Hook, NY, USA (2009)

    Google Scholar 

  12. Vaswani, A., et al.: Attention is all you need. In: Advances In Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  13. Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1063–1064. WWW 2012 Companion, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2187980.2188242

  14. Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics, Online (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6

  15. Yang, C., Zhang, J., Han, J.: Co-embedding network nodes and hierarchical labels with taxonomy based generative adversarial networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 721–730 (2020). https://doi.org/10.1109/ICDM50108.2020.00081

  16. Yu, Y., Li, Y., Shen, J., Feng, H., Sun, J., Zhang, C.: STEAM: self-supervised taxonomy expansion with mini-paths. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2020). https://doi.org/10.1145/3394486.3403145

  17. Zeng, Q., Lin, J., Yu, W., Cleland-Huang, J., Jiang, M.: Enhancing taxonomy completion with concept generation via fusing relational representations. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery. ACM (2021). https://doi.org/10.1145/3447548.3467308

  18. Zhang, J., et al.: Taxonomy completion via triplet matching network. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February (2021), pp. 4662–4670 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/16596

Download references

Acknowledgements

We would like to thank the Istituto di Analisi dei Sistemi ed Informatica - Antonio Ruberti (IASI) for supporting the experimentations through access to dedicated computing resources. We acknowledge financial support from the PNRR MUR project PE0000013-FAIR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniele Margiotta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Margiotta, D., Croce, D., Basili, R. (2023). TaxoSBERT: Unsupervised Taxonomy Expansion Through Expressive Semantic Similarity. In: Conte, D., Fred, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2023. Communications in Computer and Information Science, vol 1875. Springer, Cham. https://doi.org/10.1007/978-3-031-39059-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39059-3_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39058-6

  • Online ISBN: 978-3-031-39059-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy