Abstract
Over the last years, Automatic Text Summarization (ATS) has been considered as one of the main tasks in Natural Language Processing (NLP) that generates summaries in several languages (e.g., English, Portuguese, Spanish, etc.). One of the most significant advances in ATS is developed for Portuguese reflected with the proposals of various state-of-art methods. It is essential to know the performance of different state-of-the-art methods with respect to the upper bounds (Topline), lower bounds (Baseline-random), and other heuristics (Baseline-first). In recent works, the significance and upper bounds for Single-Document Summarization (SDS) and Multi-Document Summarization (MDS) using corpora from Document Understanding Conferences (DUC) were calculated. In this paper, a calculus of upper bounds for SDS in Portuguese using Genetic Algorithms (GA) is performed. Moreover, we present a comparison of some state-of-the-art methods with respect to the upper bounds, lower bounds, and heuristics to determinate their level of significance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
DUC website: https://www-nlpir.nist.gov/projects/duc/, TAC website: https://tac.nist.gov/.
- 2.
- 3.
- 4.
Each segmentation can be downloaded from https://gitlab.com/JohnRojas/Corpus-TeMario.
- 5.
- 6.
http://www.shvoong.com/summarizer/. (URL viewed May 7th, 2017).
- 7.
https://github.com/neopunisher/Open-Text-Summarizer/ (URL viewed February 10th, 2018).
References
Pardo, T.A.S., Rino, L.H.M., Nunes, M.G.V.: NeuralSumm: Uma Abordagem Conexionista para a Sumarização Automática de Textos. An. do IV Encontro Nac. Inteligência Artif., no. 1 (2003)
Orrú, T., Rosa, J.L.G., de Andrade Netto, M.L.: SABio: an automatic portuguese text summarizer through artificial neural networks in a more biologically plausible model. In: Vieira, R., et al. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 11–20. Springer, Heidelberg (2006). https://doi.org/10.1007/11751984_2
Pardo, T.A.S., Rino, L.H.M.: DMSumm: review and assessment. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 263–273. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45433-0_36
Cardoso, P.C.F.: Exploração de métodos de sumarização automática multidocumento com base em conhecimento semântico- discursivo. USP (2014)
Nunes, M.D.G.V., Aluisio, S.M., Pardo, T.A.S.: Um panorama do Núcleo Interinstitucional de Linguística Computacional às vésperas de sua maioridade. Linguamática 2(2), 13–27 (2010)
Pardo, T.A.S., Rino, L.H.M., Nunes, M.D.G.V.: GistSumm: a summarization tool based on a new extractive method. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds.) PROPOR 2003. LNCS (LNAI), vol. 2721, pp. 210–218. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45011-4_34
Margarido, P.R., et al.: Automatic summarization for text simplification. In: Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, pp. 310–315 (2008)
Pardo, T.A.S., Antiqueira, L., Nunes, M.D.G.V., Oliveira, O.N., Costa, L.D.F.: Using complex networks for language processing: the case of summary evaluation. In: International Conference on Communications, Circuits and Systems, pp. 2678–2682 (2006)
Antiqueira, L.: Desenvolvimento de técnicas baseadas em redes complexas para sumarização extrativa de textos. USP – São Carlos (2007)
Amancio, D.R., Nunes, M.G., Oliveira, O.N., Costa, L.D.F.: Extractive summarization using complex networks and syntactic dependency. Physica A: Stat. Mech. Appl. 391(4), 1855–1864 (2012)
Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. Department of Computer Science and Engineering, vol. 5, pp. 19–24 (2005)
Leite, D., Rino, L.: A genetic fuzzy automatic text summarizer. In: CSBC 2009. Inf. UFRGS, Brazil, vol. 2007, pp. 779–788 (2009)
Matías, G.A.: Generación Automática de Resúmenes Independientes del Lenguaje. Universidad Autónoma del Estado de México (2016)
Oliveira, M.A.D., Guelpeli, M.V.: BLMSumm – Métodos de Busca Local e Metaheurísticas na Sumarização de Textos. In: Proceedings of ENIA - VIII Encontro Nac. Inteligência Artif., vol. 1, no. 1, pp. 287–298 (2011)
Oliveira, M.A., Guelpeli, M.V.C.: The performance of BLMSumm: distinct languages with antagonistic domains and varied compressions. In: Information Science and Technology, ICIST 2012, pp. 609–614 (2012)
Pardo, T., Rino, L.: TeMário: Um Corpus para Sumarização Automática de Textos. NILC - ICMC-USP, São Carlos (2003)
Maziero, E.G., Volpe, G.: TeMário 2006 : Estendendo o Córpus TeMário (2007)
Aleixo, P., Pardo, T.A.S.: CSTNews: um Córpus de Textos Jornalísticos Anotados segundo a Teoria Discursiva Multidocumento CST (cross-document structure theory), Structure, pp. 1–12 (2008)
Rojas-Simón, J., Ledeneva, Y., García-Hernández, R.A.: Calculating the significance of automatic extractive text summarization using a genetic algorithm. J. of Intell. Fuzzy Syst. 35(1), 293–304 (2018)
Rojas Simón, J., Ledeneva, Y., García Hernández, R.A.: Calculating the upper bounds for multi-document summarization using genetic algorithms. Comput. y Sist. 22(1), 11–26 (2018)
Verma, R., Lee, D.: Extractive summarization: limits, compression, generalized model and heuristics, p. 19 (2017)
Sidorov, G.: Non-linear construction of n-grams in computational linguistics, 1st edn. Sociedad Mexicana de Inteligencia Artificial, México (2013)
Louis, A., Nenkova, A.: Automatically evaluating content selection in summarization without human models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, no. August, pp. 306–314 (2009)
Torres-Moreno, J.M., Saggion, H., Cunha, I.D., SanJuan, E., Velázquez-Morales, P.: Summary evaluation with and without references. Polibits Res. J. Comput. Sci. Comput. Eng. Appl. 42, 13–20 (2010)
Ceylan, H., Mihalcea, R., Özertem, U., Lloret, E., Palomar, M.: Quantifying the limits and success of extractive summarization systems across domains. In: Human Language Technologies, no. June, pp. 903–911 (2010)
Lin, C.-Y., Hovy, E.: The potential and limitations of automatic sentence extraction for summarization. In: Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, vol. 5, pp. 73–80 (2003)
Hong, K., Marcus, M., Nenkova, A.: System combination for multi-document summarization, pp. 107–117, September 2015
Wang, W.M., Li, Z., Wang, J.W., Zheng, Z.H.: How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds. Expert Syst. Appl. 90, 439–463 (2017)
Ledeneva, Y., García-Hernández, R.A.: Generación automática de resúmenes Retos, propuestas y experimentos (2017)
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), no. 1, pp. 25–26 (2004)
Acknowledgements
Work done under partial support of Mexican Government CONACyT Thematic Network program (Language Technologies Thematic Network project 295022). We also thank UAEMex for their support.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Rojas-Simón, J., Ledeneva, Y., García-Hernández, R.A. (2018). Calculating the Upper Bounds for Portuguese Automatic Text Summarization Using Genetic Algorithm. In: Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A. (eds) Advances in Artificial Intelligence – IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-03928-8_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03927-1
Online ISBN: 978-3-030-03928-8
eBook Packages: Computer ScienceComputer Science (R0)