Gradually Improving the Computation of Semantic Textual Similarity in Portuguese

Gonçalo Oliveira, Hugo; Oliveira Alves, Ana; Rodrigues, Ricardo

doi:10.1007/978-3-319-65340-2_68

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10423))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

2856 Accesses
6 Altmetric

Abstract

There is much research on Semantic Textual Similarity (STS) in English, specially since its inclusion in the SemEval evaluations. For other languages, it is not as common, mostly due to the unavailability of benchmarks. Recently, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes an incremental approach to ASSIN, where the computed similarity is gradually improved by exploiting different features (e.g., token overlap, semantic relations, chunks, and negation) and approaches. The best reported results, obtained with a supervised approach, would get second place overall in ASSIN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Article 30 October 2015

Czech Dataset for Semantic Textual Similarity

Semantic Textual Similarity Using Various Approaches

Notes

1.
http://nilc.icmc.usp.br/assin/.
2.
http://opennlp.apache.org/.
3.
NLP tools available from https://github.com/rikarudo/.
4.
PTStemmer is available from https://code.google.com/archive/p/ptstemmer/.
5.
http://paginas.fe.up.pt/~arocha/AED1/0607/trabalhos/thesaurus.txt.
6.
http://pt.wiktionary.org.

References

Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 497–511. ACL Press, June 2016
Google Scholar
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, and Proceedings of the Sixth International Workshop on Semantic Evaluation, vol. 2, pp. 385–393. ACL Press (2012)
Google Scholar
Fonseca, E., Santos, L., Criscuolo, M., Aluísio, S.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)
MATH Google Scholar
Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, wordnet and ensemble methods to measure semantic similarity. In: Proceedings of 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 602–608. ACL Press, June 2016
Google Scholar
Brychcín, T., Svoboda, L.: UWB at semeval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 588–594. ACL Press, June 2016
Google Scholar
Hänig, C., Remus, R., de la Puente, X.: ExB themis: extensive feature extraction from word alignments for semantic textual similarity. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 264–268. ACL Press, June 2015
Google Scholar
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 1–8. ACL Press, August 2014
Google Scholar
Zhao, J., Zhu, T., Lan, M.: ECNU: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 271–277. ACL Press, August 2014
Google Scholar
Alves, A., Ferrugento, A., Lourenço, M., Rodrigues, F.: ASAP: automatic semantic alignment for phrases. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 104–108. ACL Press, August 2014
Google Scholar
Alves, A., Simões, D., Gonçalo Oliveira, H., Ferrugento, A.: ASAP-II: from the alignment of phrases to textual similarity. In: Proceedings of 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 184–189. ACL Press, June 2015
Google Scholar
Pinheiro, V., Furtado, V., Albuquerque, A.: Semantic textual similarity of portuguese-language texts: an approach based on the semantic inferentialism model. In: Proceedings of the 11th Conference on the Computational Processing of the Portuguese Language, PROPOR 2014, São Carlos/SP, Brazil, pp. 183–188, 6–8 October 2014 (2014)
Google Scholar
Hartmann, N.: Solo queue at ASSIN: combinando abordagens tradicionais e emergentes. Linguamática 8(2), 59–64 (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the Workshop track of the International Conference on Learning Representations (ICLR), Scottsdale, Arizona (2013)
Google Scholar
Fialho, P., Marques, R., Martins, B., Coheur, L., Quaresma, P.: INESC-ID@ASSIN: medição de similaridade semântica e reconhecimento de inferência textual. Linguamática 8(2), 33–42 (2016)
Google Scholar
Alves, A., Gonçalo Oliveira, H., Rodrigues, R.: ASAPP: alinhamento semântico automático de palavras aplicado ao português. Linguamçtica 8(2), 43–58 (2016)
Google Scholar
Rodrigues, R., Gonçalo-Oliveira, H., Gomes, P.: LemPORT: a high-accuracy cross-platform lemmatizer for portuguese. In: Proceedings of the 3^rd Symposium on Languages, Applications and Technologies (SLATE 2014), OASICS, Germany, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, pp. 267–274. Dagstuhl Publishing, June 2014
Google Scholar
Dias-da-Silva, B.C.: Wordnet.Br: an exercise of human language technology research. In: Proceedings of 3rd International WordNet Conference (GWC), GWC 2006, South Jeju Island, Korea, pp. 301–303, January 2006
Google Scholar
Paiva, V., Rademaker, A., Melo, G.: OpenWordNet-PT: an open Brazilian wordnet for reasoning. In: Proceedings of 24th International Conference on Computational Linguistics, COLING (Demo Paper) (2012)
Google Scholar
Simões, A., Guinovart, X.G.: Bootstrapping a Portuguese wordnet from Galician, Spanish and English wordnets. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 239–248. Springer, Cham (2014). doi:10.1007/978-3-319-13623-3_25
Chapter Google Scholar
Maziero, E., Pardo, T., Felippo, A., Dias-da-Silva, B.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pp. 390–392 (2008)
Google Scholar
Gonçalo Oliveira, H., Santos, D., Gomes, P., Seco, N.: PAPEL: a dictionary-based lexical ontology for Portuguese. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 31–40. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85980-2_4
Chapter Google Scholar
Simões, A., Sanromán, Á.I., Almeida, J.J.: Dicionário-Aberto: a source of resources for the Portuguese language processing. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 121–127. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28885-2_14
Chapter Google Scholar
Barreiro, A.: Port4NooJ: an open source, ontology-driven portuguese linguistic system with applications in machine translation. In: Proceedings of the 2008 International NooJ Conference (NooJ 2008), Budapest, Hungary, Newcastle-upon-Tyne: Cambridge Scholars Publishing (2010)
Google Scholar
Gonçalo Oliveira, H.: Comparing and combining Portuguese lexical-semantic knowledge bases. In: Proceedings of 6^th Symposium on Languages, Applications and Technologies (SLATE 2017), OASICS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. pp. 16: 1–16: 14 (2017)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Holmes, G., Hall, M., Prank, E.: Generating rule sets from model trees. In: Foo, N. (ed.) AI 1999. LNCS, vol. 1747, pp. 1–12. Springer, Heidelberg (1999). doi:10.1007/3-540-46695-9_1
Chapter Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Mackay, D.: Introduction to Gaussian processes. In: Bishop, C.M. (ed.) Neural Networks and Machine Learning. Springer, Berlin (1998)
Google Scholar
Rodrigues, J., Branco, A., Neale, S., Silva, J.: LX-DSemVectors: distributional semantics models for Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 259–270. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_27
Chapter Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar

Download references

Acknowledgements

This work was financed by the ERDF European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project REMINDS – UTAP-ICDT/EEI-CTP/0022/2014.

Author information

Authors and Affiliations

CISUC, DEI, University of Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira, Ana Oliveira Alves & Ricardo Rodrigues
ISEC, Polytechnic Institute of Coimbra, Coimbra, Portugal
Ana Oliveira Alves
ESEC, Polytechnic Institute of Coimbra, Coimbra, Portugal
Ricardo Rodrigues

Authors

Hugo Gonçalo Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Ana Oliveira Alves
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hugo Gonçalo Oliveira or Ana Oliveira Alves .

Editor information

Editors and Affiliations

Universidade do Porto, Porto, Portugal
Eugénio Oliveira
Universidade do Porto, Porto, Portugal
João Gama
Polytechnic Institute of Porto, Porto, Portugal
Zita Vale
Universidade do Porto, Porto, Portugal
Henrique Lopes Cardoso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonçalo Oliveira, H., Oliveira Alves, A., Rodrigues, R. (2017). Gradually Improving the Computation of Semantic Textual Similarity in Portuguese. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_68

Download citation

DOI: https://doi.org/10.1007/978-3-319-65340-2_68
Published: 09 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65339-6
Online ISBN: 978-3-319-65340-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Gradually Improving the Computation of Semantic Textual Similarity in Portuguese

Abstract

Access this chapter

Similar content being viewed by others

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Czech Dataset for Semantic Textual Similarity

Semantic Textual Similarity Using Various Approaches

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Gradually Improving the Computation of Semantic Textual Similarity in Portuguese

Abstract

Access this chapter

Similar content being viewed by others

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Czech Dataset for Semantic Textual Similarity

Semantic Textual Similarity Using Various Approaches

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.