Abstract
In order to automatically identify noun synonyms, we propose a new idea which opposes classical polysemous representations of words to monosemous representations based on the “one sense per discourse” hypothesis. For that purpose, we apply the attributional similarity paradigm on two levels: corpus and document. We evaluate our methodology on well-known standard multiple choice synonymy question tests and evidence that it steadily outperforms the baseline.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, Michigan, pp. 25–32 (2005)
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: HLT 1994: Proceedings of the workshop on Human Language Technology, Morristown, NJ, USA, pp. 240–243. Association for Computational Linguistics (1994)
Gale, W., Church, K.W., Yarowsky, D.: One sense per discourse. In: HLT 1991: Proceedings of the workshop on Speech and Natural Language, Morristown, NJ, USA, pp. 233–237 (1992)
Moraliyski, R., Dias, G.: One sense per discourse for synonymy extraction (2006)
Terra, E., Clarke, C.: Frequency estimates for statistical word similarity measures. In: Proceedings of HTL/NAACL 2003, Edmonton, Canada, pp. 165–172 (2003)
Weeds, J., Weir, D., McCarthy, D.: Characterising measures of lexical distributional similarity. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)
Ehlert, B.: Making accurate lexical semantic similarity judgments using word-context co-occurrence statistics. Master’s thesis, University of California, San Diego (2003)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Turney, P.D., Littman, M.L., Bigham, J., Shnayder, V.: Combining independent modules in lexical multiple-choice problems. In: Recent Advances in Natural Language Processing III: Selected Papers from RANLP 2003, pp. 101–110 (2003)
Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus and semantic similarity. In: Proceedings of Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, pp. 212–219 (2004)
Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX), Philadelphia, USA, pp. 59–66 (2002)
Rapp, R.: Word sense discovery based on sense descriptor dissimilarity. In: Proceedings of the Ninth Machine Translation Summit, pp. 315–322 (2003)
Fellbaum, C. (ed.): WordNet: an electronic lexical database. The MIT Press, Cambridge (1998)
Sahlgren, M., Karlgren, J.: Vector-based semantic analysis using random indexing for cross-lingual query expansion. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 169–176. Springer, Heidelberg (2002)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Liu, H.: Montylingua: An end-to-end natural language processor with common sense (2004), http://web.media.mit.edu/~hugo/montylingua
Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistic 31(4), 439–475 (2005)
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Unsupervised acquisition of predominant word senses. Comput. Linguist. 33(4), 553–590 (2007)
Fisher, R.A.: Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10, 507–521 (1915)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dias, G., Moraliyski, R. (2009). Relieving Polysemy Problem for Synonymy Detection. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds) Progress in Artificial Intelligence. EPIA 2009. Lecture Notes in Computer Science(), vol 5816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04686-5_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-04686-5_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04685-8
Online ISBN: 978-3-642-04686-5
eBook Packages: Computer ScienceComputer Science (R0)