Abstract
NLP technologies and components have an increasing diffusion in mass analysis of text based dialogues, such as classifiers for sentiment polarity, trends clustering of online messages and hate speech detection. In this work we present the design and the implementation an automatic classification tool for the evaluation of the complexity of Italian texts as understood by a speaker of Italian as a second language. The classification is done within the Common European Framework of Reference for Languages (CEFR) which aims at classifying speakers language proficiency. Results of preliminary experiments on a data set of real texts, annotated by experts and used in actual CEFR exam sessions, show a strong ability of the proposed system to label texts with the correct language proficiency class and a great potential for its integration in learning tools, such systems supporting examiners in tests design and automatic evaluation of writing abilities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Council of Europe Language Policy Portal. https://www.coe.int/en/web/language-policy/home
What is underfitting and overfitting and how to deal with it. https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6803a989c76
Bachman, L., Palmer, A.: Language Assessment in Practice. Oxford University Press, Oxford (2010)
Biondi, G., Franzoni, V., Li, Y., Milani, A.: Web-based similarity for emotion recognition in web objects. In: Proceedings of the 9th International Conference on Utility and Cloud Computing, UCC 2016, pp. 327–332. ACM, New York (2016). https://doi.org/10.1145/2996890.3007883, http://doi.acm.org/10.1145/2996890.3007883
Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Sheth, A. (ed.) Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global, Hershey (2011). https://doi.org/10.4018/978-1-60960-593-3.ch008
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. The Wadsworth and Brooks-Cole Statistics-Probability Series. Taylor & Francis, Abingdon (1984)
Chiancone, A., Franzoni, V., Niyogi, R., Milani, A.: Improving link ranking quality by quasi-common neighbourhood, pp. 21–26. IEEE Press (2015). https://doi.org/10.1109/ICCSA.2015.19
De Mauro, T., Chiari, I.: Il Nuovo Vocabolario di Base della Lingua Italiana (forthcoming)
Dell’Orletta, F., Montemagni, S., Venturi, G.: Read-it: Assessing readability of Italian texts with a view to text simplification. In: Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, pp. 73–83. Association for Computational Linguistics, Edinburgh, July 2011
European Commission/EACEA/Eurydice: Key Data on Teaching Languages at School in Europe. Eurydice European Unit, Brussels. Technical report (2017)
Franzoni, V., Leung, C.H.C., Li, Y., Mengoni, P., Milani, A.: Set similarity measures for images based on collective knowledge. In: Gervasi, O., et al. (eds.) ICCSA 2015. LNCS, vol. 9155, pp. 408–417. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21404-7_30
Franzoni, V., Li, Y., Mengoni, P.: A path-based model for emotion abstraction on Facebook using sentiment analysis and taxonomy knowledge, pp. 947–952. IEEE Press (2017). https://doi.org/10.1145/3106426.3109420
Franzoni, V., Mencacci, M., Mengoni, P., Milani, A.: Heuristics for semantic path search in Wikipedia. In: Murgante, B., et al. (eds.) ICCSA 2014, Part VI. LNCS, vol. 8584, pp. 327–340. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09153-2_25
Franzoni, V., Milani, A.: PMING distance: a collaborative semantic proximity measure, vol. 2, pp. 442–449. IEEE Press (2012) .https://doi.org/10.1109/WI-IAT.2012.226
Franzoni, V., Milani, A., Pallottelli, S., Leung, C., Li, Y.: Context-based image semantic similarity, pp. 1280–1284. IEEE Press (2016). https://doi.org/10.1109/FSKD.2015.7382127
Graesser, A., McNamara, D., Louwerse, M., Cai, Z.: Coh-metrix: analysis of text on cohesion and language. Behav. Res. Methods Instrum. Comput. 36, 193–202 (2004)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer, New York (2014). https://doi.org/10.1007/978-1-4614-7138-7
Kincaid, P., Fishburne, R.P., Rogers R.L.: Derivation of new readability formulas for navy enlisted personnel. Research Branch Report, pp. 8–75. Chief of Naval Training, Millington (1975)
Leung, C.H.C., Li, Y., Milani, A., Franzoni, V.: Collective evolutionary concept distance based query expansion for effective web document retrieval. In: Murgante, B., et al. (eds.) ICCSA 2013, Part IV. LNCS, vol. 7974, pp. 657–672. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39649-6_47
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)
Palmero Aprosio, A., Moretti, G.: Italy goes to Stanford: a collection of CoreNLP modules for Italian. arXiv e-prints, September 2016
Purpura, J.: Cognition and language assessment. In: The Companion to Language Assessment, vol. III, pp. 1453–1476 (2014)
Santucci, V., Spina, S., Milani, A., Biondi, G., Bari, G.D.: Detecting hate speech for Italian language in social media (2018)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Varma, S., Simon, R.: Bias in error estimation when using cross-validation formodel selection. BMC Bioinformatics 7(1), 91 (2006). https://doi.org/10.1186/1471-2105-7-91
Wainer, J., Cawley, G.C.: Nested cross-validation when selecting classifiers is overzealous for most practical applications. CoRR abs/1809.09446 (2018)
Xiaobin, C., Meurers, D.: Ctap: a web-based tool supporting automatic complexity analysis. Research Branch Report, pp. 8–75. Chief of Naval Training, Millington (1975)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Milani, A., Spina, S., Santucci, V., Piersanti, L., Simonetti, M., Biondi, G. (2019). Text Classification for Italian Proficiency Evaluation. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11619. Springer, Cham. https://doi.org/10.1007/978-3-030-24289-3_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-24289-3_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24288-6
Online ISBN: 978-3-030-24289-3
eBook Packages: Computer ScienceComputer Science (R0)