Abstract
Spelling error correction is a task in which errors in a natural language sentence can be detected and corrected. In this paper, we consider Chinese spelling error correction (CSC) for generality. A previous state-of-the-art method for this task connects a detection network with a correction network based on BERT by soft masking. This method does solve the problem that BERT has the insufficient capability to detect the position of errors. However, we find that it still lacks sufficient inference ability and world knowledge by analyzing its results. To solve this issue, we propose a novel correction approach based on knowledge graphs (KGs), which queries triples from KGs and injects them into the sentences as domain knowledge. Moreover, we leverage MLM as correction to improve the inference ability of BERT and adopt a denoising filter to increase the accuracy of results. Experimental results on the SIGHAN dataset verify that the performance of our approach is better than state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yu, J., Li, Z.: Chinese spelling error detection and correction based on language model, pronunciation, and shape. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 220–223. Association for Computational Linguistics, Wuhan, China (2014)
Yu, L.-C., Lee, L.-H., Tseng, Y.-H., Chen, H.-H.: Overview of SIGHAN 2014 bake-off for Chinese spelling check. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 126–132. Association for Computational Linguistics, Wuhan, China (2014)
Wang, D., Song, Y., Li, J., Han, J., Zhang, H.: A hybrid approach to automatic corpus generation for Chinese spelling check. In: 23rd Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527. Association for Computational Linguistics, Brussels, Belgium (2018)
Afli, H., Qiu, Z., Way, A., Sheridan, P.: Using SMT for OCR error correction of historical texts. In: 10th International Conference on Language Resources and Evaluation, pp. 962–966. European Language Resources Association, Portorož, Slovenia (2016)
Liu, C.-L., Lai, M.-H., Chuang, Y.-H., Lee, C.-Y.: Visually and phonologically similar characters in incorrect simplified Chinese words. In: 23rd International Conference on Computational Linguistics, pp. 739–747. Chinese Information Processing Society of China, Beijing, China (2010)
Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 43rd International Conference on Acoustics, Speech and Signal Processing, pp. 5651–5655. IEEE, Brighton, United Kingdom (2019)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 23rd Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019)
Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for Chinese natural language processing. In: 25th Findings of Empirical Methods in Natural Language Processing, pp. 657–668. Association for Computational Linguistics (2020)
Liu, W., et al.: K-BERT: enabling language representation with knowledge graph. In: 34th AAAI Conference on Artificial Intelligence, pp. 2901–2908. AAAI Press, New York (2020)
Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, pp. 428–438. Springer, Arras, France (2017)
Dong, Z., Dong, Q., Hao, C.: HowNet and its computation of meaning. In: 23rd International Conference on Computational Linguistics: Demonstrations, pp. 53–56. Association for Computational Linguistics, Beijing, China (2010)
Xiong, J., Zhang, Q., Zhang, S., Hou, J., Cheng, X.: HANSpeller: a unified framework for Chinese spelling correction. Int. J. Comput. Linguist. Chinese Lang. Process. 20(1), 38–45 (2015)
Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: 5th Workshop on Noisy User-generated Text, pp. 160–169. Association for Computational Linguistics, Hong Kong, China (2019)
Wang, D., Tay, Y., Li, Z.: Confusionset-guided pointer networks for Chinese spelling check. In: 57th Conference of the Association for Computational Linguistics, pp. 5780–5785. Association for Computational Linguistics, Florence, Italy (2019)
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890. Association for Computational Linguistics, Online (2020)
Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: 23rd International Conference on Computational Linguistics, pp. 358–366. Tsinghua University Press, Beijing, China (2010)
Tseng, Y.-H., Lee, L.-H., Chang, L.-P., Chen, H.-H.: Introduction to sighan 2015 bake-off for Chinese spelling check. In: 8th SIGHAN Workshop on Chinese Language Processing, pp. 32–37. Association for Computational Linguistics, Beijing, China (2015)
Zhao, H., Cai, D., Xin, Y., Wang, Y., Jia, Z.: A hybrid model for Chinese spelling check. ACM Trans. Asian Low Resour. Lang. Inf. Process. 16(3), 21:1–21:22 (2017)
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: 23rd Workshop: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355. Association for Computational Linguistics, Brussels, Belgium (2018)
Wu, Y., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016). http://arxiv.org/abs/1609.08144
Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. http://arxiv.org/abs/1906.08101 (2019)
Brown, T.B., et al.: Language models are few-shot learners. In: 34rd Annual Conference on Neural Information Processing Systems. Association for Computational Linguistics, Vancouver, virtual (2020)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: 33rd Annual Conference on Neural Information Processing Systems, pp. 5754–5764. Association for Computational Linguistics, Vancouver, BC, Canada (2019)
Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th International Conference on Learning Representations. OpenReview.net, Addis Ababa, Ethiopia (2020)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: MASS: masked sequence to sequence pre-training for language generation. In: 36th International Conference on Machine Learning, pp. 5926–5936. PMLR, California, USA (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, X., Zhou, J., Wang, S., Li, H., Jia, J., Zhu, J. (2022). Chinese Spelling Error Detection and Correction Based on Knowledge Graph. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-11217-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)