research-article

Biological Named Entity Recognition and Role Labeling via Deep Multi-task Learning

Authors:

Dongdong Zhang,

Jing PengAuthors Info & Claims

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

Pages 450 - 455

https://doi.org/10.1145/3457682.3457751

Published: 21 June 2021 Publication History

Abstract

Bioscience is an experimental science. The qualitative and quantitative findings of the biological experiments are often exclusively available in the form of figures in published papers. In this paper, we introduce the SourceData model, which captures a key aspect of the biological experimental design by categorizing biological entity involved in the experiment into one of the six roles. Our work aims at determining whether a given entity is subjected to a perturbation or is the object of a measurement (entity role labeling) through automatic natural language algorithms. We use state-of-the-art transformer models (e.g., Bert and its variants) as a strong baseline, find that after jointly trained with biological named entity recognition task by deep multi-task learning (MTL), the F1 score gets improved by 2% compared to previous single-task architecture. Also, for named entity recognition task, the MTL method achieves comparable performance in five public datasets. Further analysis reveals the importance of fusing entity information at the input layer of entity role labeling task and incorporating global context.

References

[1]

Robin Liechti, Nancy George, Lou Götz, 2017. SourceData: a semantic platform for curating and searching figures. Nature methods 14.11 (2017): 1021-1022. https://doi.org/10.1038/nmeth.4471.

[2]

Marti A Hearst,.Anna Divoli, Harendra Guturu,et al. 2007. BioText Search Engine: beyond abstract search. Bioinformatics 23.16 (2007): 2196-2197. https://doi.org/10.1093/bioinformatics/btm301.

[3]

Bhatia, Sumit, and Prasenjit Mitra. 2014. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Transactions on Information Systems. https://doi.org/10.1145/2094072.2094075.

Digital Library

[4]

Balaji Polepalli Ramesh, Ricky J. Sethi and Hong Yu. 2015. Figure-associated text summarization and evaluation. PloS one. https://doi.org/10.1371/journal.pone.0115671.

[5]

Uma D Vempati,.Magdalena J. Przydzial, Caty Chung, 2012. Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO). PloS one. https://doi.org/10.1371/journal.pone.0049198.

[6]

Saminda Abeyruwan, Uma D. Vempati, 2014. Evolving BioAssay Ontology (BAO): modularization, integration and applications. Journal of biomedical semantics. https://doi.org/10.1186/2041-1480-5-S1-S5.

[7]

Cecilia Arighi, Lynette Hirschman, Thomas Lemberger, 2017. Bio-ID track overview. In Proceedings of the BioCreative Challenge Evaluation Workshop. BioCreative VI.

[8]

Dai, Hong-Jie, and Onkar Singh. 2018. SPRENO: a BioC module for identifying organism terms in figure captions. Database. https://doi.org/10.1093/database/bay048

[9]

Suwisa Kaewphan, Kai Hakala, Niko Miekka, 2018. Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling. Database. https://doi.org/10.1093/database/bay096.

[10]

Xuan Wang, Yu Zhang, Xiang Ren, 2019. Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics, 1745-1752. https://doi.org/10.1093/bioinformatics/bty869.

[11]

Liu, Xiaodong, Pengcheng He and Jianfeng Gao. 2019. Multi-Task Deep Neural Networks for Natural Language Understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, Florence, Italy, 4487-4496. https://doi.org/10.18653/v1/p19-1441.

[12]

Peng, Yifan, Qingyu Chen, and Zhiyong Lu. 2020. An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing. BioNLP, 205-214. https://doi.org/10.18653/v1/2020.bionlp-1.22.

[13]

Zhong, Zexuan, and Danqi Chen. 2020. A Frustratingly Easy Approach for Joint Entity and Relation Extraction. arXiv preprint arXiv:2010.12812 (2020).

[14]

Vaswani, Ashish, 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. NIPS, 5998-6008. https://doi.org/ arxiv.org/abs/1706.03762.

[15]

Ghent, Arthur W. 1966. The logic of experimental design in the biological sciences. Bioscience (1966), 17-22. https://doi.org/10.2307/1293546.

[16]

Devlin, Jacob, 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. NAACL-HLT, 4171-4186. https://doi.org/10.18653/v1/n19-1423.

[17]

Lee, Jinhyuk, 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 1234-1240. https://doi.org/10.1093/bioinformatics/btz682.

[18]

Yu Gu, Robert Tinn, Hao Cheng. 2020. Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779 (2020).

[19]

Kocaman, Veysel, and David Talby. 2021. Biomedical Named Entity Recognition at Scale. In ICPR International Workshops and Challenges. ICPR, Virtual Event. https://doi.org/10.1007/978-3-030-68763-2_48.

Digital Library

[20]

Kingma, Diederik P., and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations. ICLR San Diego, CA, USA. http://arxiv.org/abs/1412.6980.

[21]

Peng, Yifan, Qingyu Chen, and Zhiyong Lu. 2020. An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing. BioNLP, Online, 205-214. https://doi.org/10.18653/v1/2020.bionlp-1.22.

[22]

Tsai, Richard Tzong-Han, 2006. Various criteria in the evaluation of biomedical named entity recognition. BMC bioinformatics volume 7. https://doi.org/10.1186/1471-2105-7-92.

[23]

Rico Sennrich, Barry Haddow and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. ACL, Berlin, Germany. https://arxiv.org/abs/1508.07909.

Cited By

Shishehgarkhaneh MMoehler RFang YHijazi AAboutorab H(2024)Transformer-Based Named Entity Recognition in Construction Supply Chain Risk Management in AustraliaIEEE Access10.1109/ACCESS.2024.337723212(41829-41851)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3377232
Feng QYao JZhong YLi PPan Z(2022)Learning twofold heterogeneous multi-task by sharing similar convolution kernel pairsKnowledge-Based Systems10.1016/j.knosys.2022.109396252:COnline publication date: 27-Sep-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.109396

Recommendations

Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Boosted Web Named Entity Recognition via Tri-Training
TALLIP Notes and Regular Papers

Named entity extraction is a fundamental task for many natural language processing applications on the web. Existing studies rely on annotated training data, which is quite expensive to obtain large datasets, limiting the effectiveness of recognition. In ...
Named entity recognition and resolution in legal text
Semantic Processing of Legal Texts

Named entities in text are persons, places, companies, etc. that are explicitly mentioned in text using proper nouns. The process of finding named entities in a text and classifying them to a semantic type, is called named entity recognition. Resolution ...

Comments

comments powered by Disqus.

Information & Contributors

Information

Published In

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

February 2021

601 pages

ISBN:9781450389310

DOI:10.1145/3457682

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLC 2021

ICMLC 2021: 2021 13th International Conference on Machine Learning and Computing

February 26 - March 1, 2021

Shenzhen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
76
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shishehgarkhaneh MMoehler RFang YHijazi AAboutorab H(2024)Transformer-Based Named Entity Recognition in Construction Supply Chain Risk Management in AustraliaIEEE Access10.1109/ACCESS.2024.337723212(41829-41851)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3377232
Feng QYao JZhong YLi PPan Z(2022)Learning twofold heterogeneous multi-task by sharing similar convolution kernel pairsKnowledge-Based Systems10.1016/j.knosys.2022.109396252:COnline publication date: 27-Sep-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.109396

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy