Skip to main content

Text Classification for Italian Proficiency Evaluation

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2019 (ICCSA 2019)

Abstract

NLP technologies and components have an increasing diffusion in mass analysis of text based dialogues, such as classifiers for sentiment polarity, trends clustering of online messages and hate speech detection. In this work we present the design and the implementation an automatic classification tool for the evaluation of the complexity of Italian texts as understood by a speaker of Italian as a second language. The classification is done within the Common European Framework of Reference for Languages (CEFR) which aims at classifying speakers language proficiency. Results of preliminary experiments on a data set of real texts, annotated by experts and used in actual CEFR exam sessions, show a strong ability of the proposed system to label texts with the correct language proficiency class and a great potential for its integration in learning tools, such systems supporting examiners in tests design and automatic evaluation of writing abilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Council of Europe Language Policy Portal. https://www.coe.int/en/web/language-policy/home

  2. What is underfitting and overfitting and how to deal with it. https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6803a989c76

  3. Bachman, L., Palmer, A.: Language Assessment in Practice. Oxford University Press, Oxford (2010)

    Google Scholar 

  4. Biondi, G., Franzoni, V., Li, Y., Milani, A.: Web-based similarity for emotion recognition in web objects. In: Proceedings of the 9th International Conference on Utility and Cloud Computing, UCC 2016, pp. 327–332. ACM, New York (2016). https://doi.org/10.1145/2996890.3007883, http://doi.acm.org/10.1145/2996890.3007883

  5. Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Sheth, A. (ed.) Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global, Hershey (2011). https://doi.org/10.4018/978-1-60960-593-3.ch008

    Chapter  Google Scholar 

  6. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  7. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. The Wadsworth and Brooks-Cole Statistics-Probability Series. Taylor & Francis, Abingdon (1984)

    MATH  Google Scholar 

  8. Chiancone, A., Franzoni, V., Niyogi, R., Milani, A.: Improving link ranking quality by quasi-common neighbourhood, pp. 21–26. IEEE Press (2015). https://doi.org/10.1109/ICCSA.2015.19

  9. De Mauro, T., Chiari, I.: Il Nuovo Vocabolario di Base della Lingua Italiana (forthcoming)

    Google Scholar 

  10. Dell’Orletta, F., Montemagni, S., Venturi, G.: Read-it: Assessing readability of Italian texts with a view to text simplification. In: Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, pp. 73–83. Association for Computational Linguistics, Edinburgh, July 2011

    Google Scholar 

  11. European Commission/EACEA/Eurydice: Key Data on Teaching Languages at School in Europe. Eurydice European Unit, Brussels. Technical report (2017)

    Google Scholar 

  12. Franzoni, V., Leung, C.H.C., Li, Y., Mengoni, P., Milani, A.: Set similarity measures for images based on collective knowledge. In: Gervasi, O., et al. (eds.) ICCSA 2015. LNCS, vol. 9155, pp. 408–417. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21404-7_30

    Chapter  Google Scholar 

  13. Franzoni, V., Li, Y., Mengoni, P.: A path-based model for emotion abstraction on Facebook using sentiment analysis and taxonomy knowledge, pp. 947–952. IEEE Press (2017). https://doi.org/10.1145/3106426.3109420

  14. Franzoni, V., Mencacci, M., Mengoni, P., Milani, A.: Heuristics for semantic path search in Wikipedia. In: Murgante, B., et al. (eds.) ICCSA 2014, Part VI. LNCS, vol. 8584, pp. 327–340. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09153-2_25

    Chapter  Google Scholar 

  15. Franzoni, V., Milani, A.: PMING distance: a collaborative semantic proximity measure, vol. 2, pp. 442–449. IEEE Press (2012) .https://doi.org/10.1109/WI-IAT.2012.226

  16. Franzoni, V., Milani, A., Pallottelli, S., Leung, C., Li, Y.: Context-based image semantic similarity, pp. 1280–1284. IEEE Press (2016). https://doi.org/10.1109/FSKD.2015.7382127

  17. Graesser, A., McNamara, D., Louwerse, M., Cai, Z.: Coh-metrix: analysis of text on cohesion and language. Behav. Res. Methods Instrum. Comput. 36, 193–202 (2004)

    Article  Google Scholar 

  18. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer, New York (2014). https://doi.org/10.1007/978-1-4614-7138-7

    Book  MATH  Google Scholar 

  19. Kincaid, P., Fishburne, R.P., Rogers R.L.: Derivation of new readability formulas for navy enlisted personnel. Research Branch Report, pp. 8–75. Chief of Naval Training, Millington (1975)

    Google Scholar 

  20. Leung, C.H.C., Li, Y., Milani, A., Franzoni, V.: Collective evolutionary concept distance based query expansion for effective web document retrieval. In: Murgante, B., et al. (eds.) ICCSA 2013, Part IV. LNCS, vol. 7974, pp. 657–672. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39649-6_47

    Chapter  Google Scholar 

  21. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)

    Google Scholar 

  23. Palmero Aprosio, A., Moretti, G.: Italy goes to Stanford: a collection of CoreNLP modules for Italian. arXiv e-prints, September 2016

    Google Scholar 

  24. Purpura, J.: Cognition and language assessment. In: The Companion to Language Assessment, vol. III, pp. 1453–1476 (2014)

    Google Scholar 

  25. Santucci, V., Spina, S., Milani, A., Biondi, G., Bari, G.D.: Detecting hate speech for Italian language in social media (2018)

    Chapter  Google Scholar 

  26. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  27. Varma, S., Simon, R.: Bias in error estimation when using cross-validation formodel selection. BMC Bioinformatics 7(1), 91 (2006). https://doi.org/10.1186/1471-2105-7-91

    Article  Google Scholar 

  28. Wainer, J., Cawley, G.C.: Nested cross-validation when selecting classifiers is overzealous for most practical applications. CoRR abs/1809.09446 (2018)

    Google Scholar 

  29. Xiaobin, C., Meurers, D.: Ctap: a web-based tool supporting automatic complexity analysis. Research Branch Report, pp. 8–75. Chief of Naval Training, Millington (1975)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giulio Biondi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Milani, A., Spina, S., Santucci, V., Piersanti, L., Simonetti, M., Biondi, G. (2019). Text Classification for Italian Proficiency Evaluation. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11619. Springer, Cham. https://doi.org/10.1007/978-3-030-24289-3_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24289-3_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24288-6

  • Online ISBN: 978-3-030-24289-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy