Skip to main content

The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10423))

Included in the following conference series:

Abstract

In this paper we study the combined use of four different NLP toolkits—Stanford CoreNLP, GATE, OpenNLP and Twitter NLP tools—in the context of social media posts. Previous studies have shown performance comparisons between these tools, both on news and social media corporas. In this paper, we go further by trying to understand how differently these toolkits predict Named Entities, in terms of their precision and recall for three different entity types, and how they can complement each other in this task in order to achieve a combined performance superior to each individual one. Experiments on two publicly available datasets from the workshops WNUT-2015 and #MSM2013 show that using an ensemble of toolkits can improve the recognition of specific entity types - up to 10.62% for the entity type Person, 1.97% for the type Location and 1.31% for the type Organization, depending on the dataset and the criteria used for the voting. Our results also showed improvements of 3.76% and 1.69%, in each dataset respectively, on the average performance of the three entity types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://stanfordnlp.github.io/CoreNLP/.

  2. 2.

    https://gate.ac.uk/download/.

  3. 3.

    https://github.com/aritter/twitter_nlp.

  4. 4.

    https://opennlp.apache.org/.

References

  1. Atdağ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: 2013 2nd International Conference on Systems and Computer Science (ICSCS), pp. 228–233. IEEE (2013)

    Google Scholar 

  2. Baldwin, T., De Marneffe, M.C., Han, B., Kim, Y.-B., Ritter, A., Xu, W.: Shared tasks of the: Twitter lexical normalization and named entity recognition. In: Proceedings of the Workshop on Noisy User-generated Text (WNUT 2015), Beijing, China (2015)

    Google Scholar 

  3. Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N.: Twitie: an open-source information extraction pipeline for microblog text. In: RANLP, pp. 83–90 (2013)

    Google Scholar 

  4. Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (# msm2013) concept extraction challenge (2013)

    Google Scholar 

  5. Clark, A., Fox, C., Lappin, S.: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Hoboken (2013)

    Google Scholar 

  6. Figueira, A., Sandim, M., Fortuna, P.: An approach to relevancy detection: contributions to the automatic detection of relevance in social networks. In: Rocha, A., Correia, A.M., Adeli, H., Reis, L.P., Teixeira, M.M. (eds.) ITEM 2014. AISC, vol. 444, pp. 89–99. Springer, Cham (2016). doi:10.1007/978-3-319-31232-3_9

    Chapter  Google Scholar 

  7. Gate.ac.uk - wiki/twitie.html. https://gate.ac.uk/wiki/twitie.html. Accessed 06 Oct 2017

  8. Jiang, R., Banchs, R.E., Li, H.: Evaluating and combining named entity recognition systems. In: ACL 2016, p. 21 (2016)

    Google Scholar 

  9. Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 81–88. ACM (2010)

    Google Scholar 

  10. C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), pp. 55–60 (2014)

    Google Scholar 

  11. Nebhi, K., Bontcheva, K., Gorrell, G.: Restoring capitalization in# tweets. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1111–1115. ACM (2015)

    Google Scholar 

  12. Apache opennlp. https://opennlp.apache.org/. Accessed 06 Oct 2017

  13. Pinto, A., Gonçalo Oliveira, H., Oliveira Alves, A.: Comparing the performance of different nlp toolkits in formal and social media text. In: OASIcs-OpenAccess Series in Informatics, vol. 51. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)

    Google Scholar 

  14. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, vol. 11, pp. 157–176. Springer, Heidelberg (1999). doi:10.1007/978-94-017-2390-9_10

    Chapter  Google Scholar 

  15. Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)

    Google Scholar 

  16. Rodriquez, K.J., Bryant, M., Blanke, T., Luszczynska, M.: Comparison of named entity recognition tools for raw OCR text. In: KONVENS, pp. 410–414 (2012)

    Google Scholar 

  17. Saha, S., Ekbal, A.: Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl. Eng. 85, 15–39 (2013)

    Article  Google Scholar 

  18. Wu, C.-W., Jan, S.-Y., Tsai, R.T.-H., Hsu, W.-L.: On using ensemble methods for Chinese named entity recognition. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 142–145 (2006)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the ERDF European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT (Portuguese Foundation for Science and Technology) within project “Reminds/UTAP-ICDT/EEI-CTP/0022/2014”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filipe Batista .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Batista, F., Figueira, Á. (2017). The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65340-2_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65339-6

  • Online ISBN: 978-3-319-65340-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy