The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

Batista, Filipe; Figueira, Álvaro

doi:10.1007/978-3-319-65340-2_65

Filipe Batista²⁴ &
Álvaro Figueira²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10423))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

2929 Accesses
9 Altmetric

Abstract

In this paper we study the combined use of four different NLP toolkits—Stanford CoreNLP, GATE, OpenNLP and Twitter NLP tools—in the context of social media posts. Previous studies have shown performance comparisons between these tools, both on news and social media corporas. In this paper, we go further by trying to understand how differently these toolkits predict Named Entities, in terms of their precision and recall for three different entity types, and how they can complement each other in this task in order to achieve a combined performance superior to each individual one. Experiments on two publicly available datasets from the workshops WNUT-2015 and #MSM2013 show that using an ensemble of toolkits can improve the recognition of specific entity types - up to 10.62% for the entity type Person, 1.97% for the type Location and 1.31% for the type Organization, depending on the dataset and the criteria used for the voting. Our results also showed improvements of 3.76% and 1.69%, in each dataset respectively, on the average performance of the three entity types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Overview of NLPCC2022 Shared Task 5 Track 2: Named Entity Recognition

Overview of the NLPCC 2020 Shared Task: AutoIE

Mono Versus Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

Notes

References

Atdağ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: 2013 2nd International Conference on Systems and Computer Science (ICSCS), pp. 228–233. IEEE (2013)
Google Scholar
Baldwin, T., De Marneffe, M.C., Han, B., Kim, Y.-B., Ritter, A., Xu, W.: Shared tasks of the: Twitter lexical normalization and named entity recognition. In: Proceedings of the Workshop on Noisy User-generated Text (WNUT 2015), Beijing, China (2015)
Google Scholar
Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N.: Twitie: an open-source information extraction pipeline for microblog text. In: RANLP, pp. 83–90 (2013)
Google Scholar
Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (# msm2013) concept extraction challenge (2013)
Google Scholar
Clark, A., Fox, C., Lappin, S.: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Hoboken (2013)
Google Scholar
Figueira, A., Sandim, M., Fortuna, P.: An approach to relevancy detection: contributions to the automatic detection of relevance in social networks. In: Rocha, A., Correia, A.M., Adeli, H., Reis, L.P., Teixeira, M.M. (eds.) ITEM 2014. AISC, vol. 444, pp. 89–99. Springer, Cham (2016). doi:10.1007/978-3-319-31232-3_9
Chapter Google Scholar
Gate.ac.uk - wiki/twitie.html. https://gate.ac.uk/wiki/twitie.html. Accessed 06 Oct 2017
Jiang, R., Banchs, R.E., Li, H.: Evaluating and combining named entity recognition systems. In: ACL 2016, p. 21 (2016)
Google Scholar
Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 81–88. ACM (2010)
Google Scholar
C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), pp. 55–60 (2014)
Google Scholar
Nebhi, K., Bontcheva, K., Gorrell, G.: Restoring capitalization in# tweets. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1111–1115. ACM (2015)
Google Scholar
Apache opennlp. https://opennlp.apache.org/. Accessed 06 Oct 2017
Pinto, A., Gonçalo Oliveira, H., Oliveira Alves, A.: Comparing the performance of different nlp toolkits in formal and social media text. In: OASIcs-OpenAccess Series in Informatics, vol. 51. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)
Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, vol. 11, pp. 157–176. Springer, Heidelberg (1999). doi:10.1007/978-94-017-2390-9_10
Chapter Google Scholar
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)
Google Scholar
Rodriquez, K.J., Bryant, M., Blanke, T., Luszczynska, M.: Comparison of named entity recognition tools for raw OCR text. In: KONVENS, pp. 410–414 (2012)
Google Scholar
Saha, S., Ekbal, A.: Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl. Eng. 85, 15–39 (2013)
Article Google Scholar
Wu, C.-W., Jan, S.-Y., Tsai, R.T.-H., Hsu, W.-L.: On using ensemble methods for Chinese named entity recognition. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 142–145 (2006)
Google Scholar

Download references

Acknowledgments

This work is supported by the ERDF European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT (Portuguese Foundation for Science and Technology) within project “Reminds/UTAP-ICDT/EEI-CTP/0022/2014”.

Author information

Authors and Affiliations

CRACS/INESC TEC and University of Porto, Rua do Campo Alegre, 1021/1055, 4169-007, Porto, Portugal
Filipe Batista & Álvaro Figueira

Authors

Filipe Batista
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro Figueira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filipe Batista .

Editor information

Editors and Affiliations

Universidade do Porto, Porto, Portugal
Eugénio Oliveira
Universidade do Porto, Porto, Portugal
João Gama
Polytechnic Institute of Porto, Porto, Portugal
Zita Vale
Universidade do Porto, Porto, Portugal
Henrique Lopes Cardoso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batista, F., Figueira, Á. (2017). The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_65

Download citation

DOI: https://doi.org/10.1007/978-3-319-65340-2_65
Published: 09 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65339-6
Online ISBN: 978-3-319-65340-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

Abstract

Access this chapter

Similar content being viewed by others

Overview of NLPCC2022 Shared Task 5 Track 2: Named Entity Recognition

Overview of the NLPCC 2020 Shared Task: AutoIE

Mono Versus Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

Abstract

Access this chapter

Similar content being viewed by others

Overview of NLPCC2022 Shared Task 5 Track 2: Named Entity Recognition

Overview of the NLPCC 2020 Shared Task: AutoIE

Mono Versus Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.