Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

Wagner, Dominik; Baumann, Ilja; Bayerl, Sebastian P.; Riedhammer, Korbinian; Bocklet, Tobias

Computer Science > Sound

arXiv:2211.08774 (cs)

[Submitted on 16 Nov 2022 (v1), last revised 7 Dec 2023 (this version, v3)]

Title:Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

Authors:Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet

View PDF HTML (experimental)

Abstract:We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the proven method of concatenating speaker vectors to the acoustic features and supplying them as auxiliary model inputs remains a viable option to increase the robustness of end-to-end architectures. The effect on transformer models is stronger, when more noise is added to the input speech. The most substantial benefits for systems based on wav2vec 2.0 are achieved under moderate or no noise conditions. Both x-vectors and ECAPA-TDNN embeddings outperform i-vectors as speaker representations. The optimal embedding size depends on the dataset and also varies with the noise condition.

Comments:	Accepted at ASRU 2023
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2211.08774 [cs.SD]
	(or arXiv:2211.08774v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2211.08774

Submission history

From: Dominik Wagner [view email]
[v1] Wed, 16 Nov 2022 09:02:41 UTC (328 KB)
[v2] Sun, 11 Jun 2023 11:30:52 UTC (377 KB)
[v3] Thu, 7 Dec 2023 09:32:06 UTC (110 KB)

Computer Science > Sound

Title:Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Sound

Title:Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.