CR-CTC: Consistency regularization on CTC for improved speech recognition

Yao, Zengwei; Kang, Wei; Yang, Xiaoyu; Kuang, Fangjun; Guo, Liyong; Zhu, Han; Jin, Zengrui; Li, Zhaoqing; Lin, Long; Povey, Daniel

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2410.05101 (eess)

[Submitted on 7 Oct 2024 (v1), last revised 14 Feb 2025 (this version, v4)]

Title:CR-CTC: Consistency regularization on CTC for improved speech recognition

Authors:Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey

View PDF HTML (experimental)

Abstract:Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. We provide in-depth insights into its essential behaviors from three perspectives: 1) it conducts self-distillation between random pairs of sub-models that process different augmented views; 2) it learns contextual representation through masked prediction for positions within time-masked regions, especially when we increase the amount of time masking; 3) it suppresses the extremely peaky CTC distributions, thereby reducing overfitting and improving the generalization ability. Extensive experiments on LibriSpeech, Aishell-1, and GigaSpeech datasets demonstrate the effectiveness of our CR-CTC. It significantly improves the CTC performance, achieving state-of-the-art results comparable to those attained by transducer or systems combining CTC and attention-based encoder-decoder (CTC/AED). We release our code at this https URL.

Comments:	Published as a conference paper at ICLR 2025
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2410.05101 [eess.AS]
	(or arXiv:2410.05101v4 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2410.05101

Submission history

From: Zengwei Yao [view email]
[v1] Mon, 7 Oct 2024 14:56:07 UTC (421 KB)
[v2] Sun, 13 Oct 2024 13:35:04 UTC (583 KB)
[v3] Sun, 8 Dec 2024 13:17:19 UTC (587 KB)
[v4] Fri, 14 Feb 2025 13:13:03 UTC (587 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:CR-CTC: Consistency regularization on CTC for improved speech recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:CR-CTC: Consistency regularization on CTC for improved speech recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.