Skip to main content

Chinese Spelling Error Detection and Correction Based on Knowledge Graph

  • Conference paper
  • First Online:
Database Systems for Advanced Applications. DASFAA 2022 International Workshops (DASFAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Included in the following conference series:

  • 1309 Accesses

Abstract

Spelling error correction is a task in which errors in a natural language sentence can be detected and corrected. In this paper, we consider Chinese spelling error correction (CSC) for generality. A previous state-of-the-art method for this task connects a detection network with a correction network based on BERT by soft masking. This method does solve the problem that BERT has the insufficient capability to detect the position of errors. However, we find that it still lacks sufficient inference ability and world knowledge by analyzing its results. To solve this issue, we propose a novel correction approach based on knowledge graphs (KGs), which queries triples from KGs and injects them into the sentences as domain knowledge. Moreover, we leverage MLM as correction to improve the inference ability of BERT and adopt a denoising filter to increase the accuracy of results. Experimental results on the SIGHAN dataset verify that the performance of our approach is better than state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://kanji-database.sourceforge.net/

References

  1. Yu, J., Li, Z.: Chinese spelling error detection and correction based on language model, pronunciation, and shape. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 220–223. Association for Computational Linguistics, Wuhan, China (2014)

    Google Scholar 

  2. Yu, L.-C., Lee, L.-H., Tseng, Y.-H., Chen, H.-H.: Overview of SIGHAN 2014 bake-off for Chinese spelling check. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 126–132. Association for Computational Linguistics, Wuhan, China (2014)

    Google Scholar 

  3. Wang, D., Song, Y., Li, J., Han, J., Zhang, H.: A hybrid approach to automatic corpus generation for Chinese spelling check. In: 23rd Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527. Association for Computational Linguistics, Brussels, Belgium (2018)

    Google Scholar 

  4. Afli, H., Qiu, Z., Way, A., Sheridan, P.: Using SMT for OCR error correction of historical texts. In: 10th International Conference on Language Resources and Evaluation, pp. 962–966. European Language Resources Association, Portorož, Slovenia (2016)

    Google Scholar 

  5. Liu, C.-L., Lai, M.-H., Chuang, Y.-H., Lee, C.-Y.: Visually and phonologically similar characters in incorrect simplified Chinese words. In: 23rd International Conference on Computational Linguistics, pp. 739–747. Chinese Information Processing Society of China, Beijing, China (2010)

    Google Scholar 

  6. Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 43rd International Conference on Acoustics, Speech and Signal Processing, pp. 5651–5655. IEEE, Brighton, United Kingdom (2019)

    Google Scholar 

  7. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 23rd Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019)

    Google Scholar 

  8. Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for Chinese natural language processing. In: 25th Findings of Empirical Methods in Natural Language Processing, pp. 657–668. Association for Computational Linguistics (2020)

    Google Scholar 

  9. Liu, W., et al.: K-BERT: enabling language representation with knowledge graph. In: 34th AAAI Conference on Artificial Intelligence, pp. 2901–2908. AAAI Press, New York (2020)

    Google Scholar 

  10. Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, pp. 428–438. Springer, Arras, France (2017)

    Google Scholar 

  11. Dong, Z., Dong, Q., Hao, C.: HowNet and its computation of meaning. In: 23rd International Conference on Computational Linguistics: Demonstrations, pp. 53–56. Association for Computational Linguistics, Beijing, China (2010)

    Google Scholar 

  12. Xiong, J., Zhang, Q., Zhang, S., Hou, J., Cheng, X.: HANSpeller: a unified framework for Chinese spelling correction. Int. J. Comput. Linguist. Chinese Lang. Process. 20(1), 38–45 (2015)

    Google Scholar 

  13. Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: 5th Workshop on Noisy User-generated Text, pp. 160–169. Association for Computational Linguistics, Hong Kong, China (2019)

    Google Scholar 

  14. Wang, D., Tay, Y., Li, Z.: Confusionset-guided pointer networks for Chinese spelling check. In: 57th Conference of the Association for Computational Linguistics, pp. 5780–5785. Association for Computational Linguistics, Florence, Italy (2019)

    Google Scholar 

  15. Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890. Association for Computational Linguistics, Online (2020)

    Google Scholar 

  16. Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: 23rd International Conference on Computational Linguistics, pp. 358–366. Tsinghua University Press, Beijing, China (2010)

    Google Scholar 

  17. Tseng, Y.-H., Lee, L.-H., Chang, L.-P., Chen, H.-H.: Introduction to sighan 2015 bake-off for Chinese spelling check. In: 8th SIGHAN Workshop on Chinese Language Processing, pp. 32–37. Association for Computational Linguistics, Beijing, China (2015)

    Google Scholar 

  18. Zhao, H., Cai, D., Xin, Y., Wang, Y., Jia, Z.: A hybrid model for Chinese spelling check. ACM Trans. Asian Low Resour. Lang. Inf. Process. 16(3), 21:1–21:22 (2017)

    Google Scholar 

  19. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: 23rd Workshop: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355. Association for Computational Linguistics, Brussels, Belgium (2018)

    Google Scholar 

  20. Wu, Y., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016). http://arxiv.org/abs/1609.08144

  21. Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. http://arxiv.org/abs/1906.08101 (2019)

  22. Brown, T.B., et al.: Language models are few-shot learners. In: 34rd Annual Conference on Neural Information Processing Systems. Association for Computational Linguistics, Vancouver, virtual (2020)

    Google Scholar 

  23. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: 33rd Annual Conference on Neural Information Processing Systems, pp. 5754–5764. Association for Computational Linguistics, Vancouver, BC, Canada (2019)

    Google Scholar 

  24. Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019)

  25. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th International Conference on Learning Representations. OpenReview.net, Addis Ababa, Ethiopia (2020)

    Google Scholar 

  26. Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: MASS: masked sequence to sequence pre-training for language generation. In: 36th International Conference on Machine Learning, pp. 5926–5936. PMLR, California, USA (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, X., Zhou, J., Wang, S., Li, H., Jia, J., Zhu, J. (2022). Chinese Spelling Error Detection and Correction Based on Knowledge Graph. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11217-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11216-4

  • Online ISBN: 978-3-031-11217-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy