0% found this document useful (0 votes)

17 views4 pages

6690 6215 1 PB

Uploaded by

Óscar Iván Montero Ortiz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

6690 6215 1 PB

Uploaded by

Óscar Iván Montero Ortiz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Procesamiento del Lenguaje Natural, Revista nº 74, marzo de 2025, pp.

395-398 recibido 04-12-2024 revisado 21-01-2025 aceptado 31-01-2025

Deep learning applied to speech processing:

Development of novel models and techniques
Aprendizaje profundo aplicado al procesamiento de voz:
Desarrollo de nuevos modelos y técnicas
Roberto Andrés Carofilis Vasco
Departamento de Ingenierı́a Eléctrica y de Sistemas y Automática, Universidad de León
Campus de Vegazana, s/n, 24007 León, España
andres.vasco@unileon.es

Abstract: This is the summary of the Ph.D. thesis conducted by Roberto Andrés
Carofilis Vasco, under the supervision of Prof. Enrique Alegre Gutiérrez and Prof.
Laura Fernández Robles at the University of León. The thesis defense took place in
León, Spain, on December 20, 2023, in the presence of a committee formed by Dr.
Luis Fernando D’Haro (Polytechnic University of Madrid, Spain), Dr. Kenneth P.
Camilleri (University of Malta, Malta), and Dr. Victor González Castro (University
of León, Spain). The thesis received international mention following a 3-month stay
at the Idiap Research Institute in Switzerland, under the supervision of Dr. Petr
Motlicek. The thesis was awarded with the outstanding cum laude distinction.
Keywords: Speech processing, language identification, accent identification, spea-
ker identification.
Resumen: Este es el resumen de la tesis doctoral realizada por Roberto Andrés
Carofilis Vasco, bajo la dirección del Prof. Enrique Alegre Gutiérrez y la Prof. Lau-
ra Fernández Robles en la Universidad de León. La defensa de la tesis se realizó en
León, España, el 20 de diciembre de 2023 ante un tribunal compuesto por el Dr. Luis
Fernando D’Haro (Universidad Politécnica de Madrid, España), el Dr. Kenneth P.
Camilleri (Universidad de Malta, Malta), y el Dr. Victor González Castro (Univer-
sidad de León, España). La tesis obtuvo la mención internacional tras una estancia
de 3 meses en el Idiap Research Institute, en Suiza, bajo la supervisión del Dr. Petr
Motlicek. La tesis obtuvo la calificación de sobresaliente cum laude.
Palabras clave: Procesamiento del habla, identificación de idiomas, identificación
de acentos, identificación de hablantes.

1 Introduction racy and robustness on specific tasks. Moreo-

ver, these models are often complex and re-
Speech-processing models have gained in- quire significant computational resources for
creasing importance across various fields, in- both the training and inference phases, resul-
cluding law enforcement and cybersecurity. ting in significant costs and time investments.
These models play a crucial role in the fight The thesis focuses on three speech-
against crimes like child exploitation and hu- processing tasks: language identification, ac-
man trafficking, helping in suspect identifica- cent identification, and speaker identifica-
tion and providing evidence in criminal inves- tion. All three tasks are crucial in academia,
tigations. They can also be used for other ap- industry, and cybersecurity, being useful in
plications, such as speech recognition in per- tasks such as victim and fugitive identifica-
sonal assistants, voice control systems, and tion and tracking, crime prevention, and sus-
language learning tools. pect segmentation. In addition, they have the
However, speech processing models face potential to improve automatic speech recog-
numerous challenges, a major one being the nition systems, addressing the challenges of
scarcity of data. Acquiring sufficient and rele- creating robust systems that are resilient to
vant speech data poses an obstacle, as it ma- the particularities of speech in different re-
kes it difficult to train models that show accu- gions.

ISSN 1135-5948 DOI 10.26342/2025-74-28 ©2025 Sociedad Española para el Procesamiento del Lenguaje Natural
Roberto Andrés Carofilis Vasco

In this thesis, we propose new techniques, gram and the dimensionality-reduced heat-
models, and datasets to address the descri- maps generated with the Grad-CAM inter-
bed speech processing tasks, which facilita- pretability method (Selvaraju et al., 2020).
te the creation of systems with state-of-the- This descriptor is capable of transferring
art performance, and require a relatively low knowledge extracted from a Convolutional
amount of data and computational resour- Neural Network (CNN) specialized in accent
ces to train. Motivated by our collaboration identification, to be used as additional infor-
with the European project “Global Respon- mation in a Classical Machine Learning Al-
se Against Child Exploitation”(GRACE), we gorithm (CMLA), to improve the results of
focus on the creation of applications that are the CMLA by enriching the data it receives
useful for law enforcement agencies in their as input.
fight against cybercrime and child sexual ex- We used Grad-Transfer for the classifica-
ploitation. tion of native English accents and compared
Several contributions presented in this it with the results achieved by CMLA and
thesis will be used by Europol and the na- state-of-the-art deep learning models fed only
tional law enforcement agencies of the Euro- by spectrograms. Grad-Transfer is especially
pean Union countries. The objective of the useful in data-poor tasks, where CMLA may
GRACE project is the creation of tools that give better results than larger models.
allow the monitoring and generation of auto- The description of the pipeline, and
matic alerts in cases of possible risk involving the experimental results achieved, were pu-
minors. blished in the IEEE/ACM Transactions on
Among the proposals of this thesis are Audio, Speech, and Language Processing
new systems capable of achieving competiti- journal (Carofilis et al., 2023a).
ve results even though they have been trained Chapter 4, entitled “MeWEHV: Mel and
with a limited amount of data. In addition, Wave Embeddings for Human Voice Tasks”
it presents two new models capable of being presents a novel embedding enrichment pro-
trained with limited computational resources cedure that combines the outputs of two con-
and at the same time achieving results supe- catenated models as independent branches of
rior to those of other state-of-the-art models. the same model. On the one hand, a branch
We also include a new highly balanced da- with an embedding generation model fed by
taset, and the experimental setup used in all raw audio waves, called wave encoder, and,
the experiments carried out to allow reprodu- on the other hand, a branch with a CNN fed
cibility of the results and to make the results by MFCCs of the raw audios, called MFCC
presented comparable with future tools. encoder.
We designed an architecture, named Me-
2 Thesis Overview WEHV, capable of interacting with the two
This thesis is composed of 6 chapters, which branches through a set of layers, inclu-
are described below: ding LSTM layers and attention mechanisms,
Chapter 1 presents the objectives, moti- combining the information extracted from
vations, and introduces the contributions of both representations. MeWEHV was tested
this thesis. on the language identification, accent identi-
Chapter 2 contains a detailed review of fication, and speaker identification tasks.
state-of-the-art approaches related to langua- We empirically evaluated the hypothesis
ge identification, accent identification, and that there is a complementarity between the
speaker identification tasks, and related work embeddings of the wave encoder, this being
on the proposed contributions. We also men- a non-imposed representation of the acous-
tion the main limitations of the methods re- tic information, and the embeddings of the
viewed and possible improvements that can MFCC encoder, generated from MFCCs, this
be applied. being an imposed representation.
In Chapter 3, entitled “Improvement of We presented a new speaker identifica-
accent classification models through Grad- tion dataset, named YouSpeakers204, which
Transfer from Spectrograms and Gradient- is highly balanced in terms of speaker accent
weighted Class Activation Mapping” we pre- and gender. We compared the MeWEHV mo-
sent Grad-Transfer, a novel descriptor based del with six state-of-the-art models on the
on the concatenation of a flattened spectro- proposed tasks using nine datasets, including

396
Deep learning applied to speech processing: Development of novel models and techniques

YouSpeakers204. CNN-based class-discriminative localization

Details of the MeWEHV architecture, da- technique Grad-CAM and spectrograms.
taset information, and experimental results Novel accent classification approach.
were published in the IEEE Access jour- We proposed a new method for accent clas-
nal (Carofilis et al., 2023b). sification using Grad-Transfer, so that the
Chapter 5, entitled “Squeeze-and- method transfers knowledge from a CNN to
excitation for embeddings weighting in a CMLA, achieving better results than other
speech classification tasks”, presents the state-of-the-art models. This is the first time
Squeeze-and-excitation for Embeddings in literature to propose the use of a Grad-
Network (SaEENet), an update of the Me- CAM-based method for knowledge transfer
WEHV architecture. SaEENet is built using between machine learning models.
novel neural layers and several optimizations Benchmark setup for VCTK. We pu-
inspired by recent advances in other deep blicly present a setup for the Voice Cloning
learning fields, such as the use of depthwise Toolkit (VCTK) dataset (Veaux et al., 2017)
separable convolutions (Chollet, 2017), in the accent identification task, along with
and squeeze-and-excitation blocks (Hu et the results achieved by Grad-Transfer using
al., 2020), initially proposed in the image that setup. With the aim that it can be used
processing field, and GRU layers (Cho et al., by researchers to test their models and com-
2014), originally used in text processing. pare the results with those of this work.
In the SaEENet model, we introduce a no- Multi-representation audio pipeline.
vel implementation of squeeze-and-excitation We introduced a new pipeline to generate
block, which processes the stacked embed- rich embeddings by merging multiple audio
dings considering time as a dimension con- representations. This approach establishes a
taining the target channels. Instead of weigh- basis for improving large pre-trained models
ting the relevance of 2D channels of a convo- and increasing their performance without the
lutional network, SaEENet weights each 1D need for retraining all their weights.
embedding according to its relevance. This MeWEHV model architecture. Based
allows the next layer of the model to have on this pipeline we proposed the MeWEHV
the context of which embedding is more re- deep learning model architecture, which ef-
levant, reducing the impact of embeddings ficiently handles three speech classification
generated from audio segments that do not tasks and achieves state-of-the-art perfor-
contain speech or contain unnecessary infor- mance on nine datasets. MeWEHV levera-
mation, and increasing the relevance of the ges the knowledge of frozen weights of pre-
segments that contain information of inter- trained speech processing models and impro-
est to the model. ves their performance by enriching the em-
We compared SaEENet with other state- beddings generated by them by adding infor-
of-the-art models, including MeWEHV, using mation extracted from MFCCs, as a comple-
three datasets, for the language identifica- mentary representation. The MeWEHV ar-
tion, accent identification, and speaker iden- chitecture requires a relatively low number of
tification tasks. trainable parameters, making it suitable for
This chapter has been presented in an ar- resource-constrained environments.
ticle detailing the work done and submitted YouSpeakers204 dataset. We created
to a journal. a new dataset for speaker identification and
Chapter 6 summarizes the conclusions of accent identification, called YouSpeakers204,
this thesis and provides an outlook for pos- with 19607 audio clips and 204 speakers,
sible future research lines to extend the pre- which was created using public YouTube vi-
sented work. deos. The dataset is highly balanced accor-
ding to the gender of the speakers and six ac-
3 Contributions cents: United States, Canada, Scotland, En-
The main contributions of this thesis are pre- gland, England, Ireland, and Australia.
sented below: Benchmarking Latin American Spa-
Grad-Transfer feature extractor. We nish Corpora. We used, for the first time in
introduced the new Grad-Transfer feature ex- literature, the publicly available Latin Ame-
tractor to represent distinctive audio featu- rican Spanish Corpora dataset (Guevara-
res that combine information from both the Rukoz et al., 2020) in the accent identifica-

397
Roberto Andrés Carofilis Vasco

tion task, providing benchmark results with rough grad-transfer from spectrograms
the systems we designed and an experimental and gradient-weighted class activation
setup made publicly available for reproduci- mapping. IEEE ACM Transactions on
bility and future research. Audio, Speech, and Language Processing,
SaEENet model architecture. We pro- 31:2859–2871.
posed SaEENet, a novel model architecture Carofilis, A., L. Fernández-Robles, E. Alegre,
that achieves competitive results in speaker, y E. Fidalgo. 2023b. MeWEHV: Mel and
language, and accent identification tasks. For wave embeddings for human voice tasks.
the first time in the literature, we introduced IEEE Access, 11:80089–80104.
the use of squeeze-and-excitation blocks to
weight and filter compressed information in Cho, K., B. van Merrienboer, D. Bahdanau,
embeddings generated from audio clips. y Y. Bengio. 2014. On the properties
Squeeze-and-excitation variants eva- of neural machine translation: Encoder-
luation. We evaluated three variants of decoder approaches. En D. Wu M. Car-
squeeze-and-excitation blocks and presented puat X. Carreras, y E. M. Vecchi, editores,
which variants work best for weighting em- Eighth Workshop on Syntax, Semantics
beddings of state-of-the-art models trained and Structure in Statistical Translation,
with self-supervised learning, and feature páginas 103–111. Association for Compu-
maps generated by a CNN. tational Linguistics.
State-of-the-art performance. We Chollet, F. 2017. Xception: Deep lear-
successfully outperformed the results of the ning with depthwise separable convolu-
MeWEHV model and other state-of-the-art tions. En IEEE Conference on Computer
models using the SaEENet architecture in Vision and Pattern Recognition, páginas
the tasks of speaker identification, language 1800–1807. IEEE Computer Society.
identification, and accent identification.
Guevara-Rukoz, A., I. Demirsahin, F. He,
Among the other novelties of SaEENet are
S. C. Chu, S. Sarin, K. Pipatsrisawat,
the use of depthwise separable convolution
A. Gutkin, A. Butryna, y O. Kjartansson.
layers and GRU layers, reducing the number
2020. Crowdsourcing latin american spa-
of trainable parameters.
nish for low-resource text-to-speech. En
Real-world application. We applied the
N. Calzolari F. Béchet P. Blache K. Chou-
models and techniques developed in this work
kri C. Cieri T. Declerck S. Goggi H. Isaha-
to real-world scenarios, focusing specifically
ra B. Maegaard J. Mariani H. Mazo
on extracting speaker information to identify
A. Moreno J. Odijk, y S. Piperidis, edi-
offenders and victims. This work contributes
tores, Proceedings of The 12th Langua-
to the efforts of the GRACE project to leve-
ge Resources and Evaluation Conferen-
rage machine learning techniques to combat
ce, LREC 2020, páginas 6504–6513. Eu-
child sexual exploitation.
ropean Language Resources Association.
Acknowledgements Hu, J., L. Shen, S. Albanie, G. Sun, y
This work was supported in part by the Eu- E. Wu. 2020. Squeeze-and-excitation
ropean Union’s Horizon 2020 Research and networks. IEEE Transactions on Pat-
Innovation Framework Programme under the tern Analysis and Machine Intelligence,
Global Response Against Child Exploitation 42(8):2011–2023.
(GRACE) Project under Grant 883341; in Selvaraju, R. R., M. Cogswell, A. Das,
part by the Predoctoral Grant of the Junta de R. Vedantam, D. Parikh, y D. Batra.
Castilla y León, under Grant EDU/875/2021; 2020. Grad-cam: Visual explanations
and in part by the framework agreement bet- from deep networks via gradient-based lo-
ween the University of León and Spanish Na- calization. Proceedings of the IEEE Inter-
tional Cybersecurity Institute (INCIBE) un- national Conference on Computer Vision,
der Addendum 01. 128(2):336–359.

References Veaux, C., J. Yamagishi, K. MacDonald, y

others. 2017. Superseded-CSTR VCTK
Carofilis, A., E. Alegre, E. Fidalgo, y corpus: English multi-speaker corpus for
L. Fernández-Robles. 2023a. Improve- CSTR voice cloning toolkit.
ment of accent classification models th-

398

How To Find A Media Buyer
100% (1)
How To Find A Media Buyer
10 pages
Speech Recognition
100% (4)
Speech Recognition
576 pages
Sepaktakraw Training Program 2019
100% (2)
Sepaktakraw Training Program 2019
2 pages
Linear Dynamic Models For Automatic Speech Recognition
No ratings yet
Linear Dynamic Models For Automatic Speech Recognition
335 pages
Self Learning Speaker Identification A System For PDF
No ratings yet
Self Learning Speaker Identification A System For PDF
185 pages
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
100% (1)
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
148 pages
Thesis-Speech Recognition Markov
No ratings yet
Thesis-Speech Recognition Markov
65 pages
MSC Behra123van Hamid
No ratings yet
MSC Behra123van Hamid
75 pages
An Overview of Text-Independent Speaker Recognitio PDF
No ratings yet
An Overview of Text-Independent Speaker Recognitio PDF
31 pages
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
No ratings yet
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
63 pages
Build Automatic Speech Recognition System: Bachelor of Technology
No ratings yet
Build Automatic Speech Recognition System: Bachelor of Technology
25 pages
A Phonotactic Language Model For Spoken Language Identification
No ratings yet
A Phonotactic Language Model For Spoken Language Identification
8 pages
Este Es 1 Make 01 00031 PDF
No ratings yet
Este Es 1 Make 01 00031 PDF
17 pages
Thesis Bich Ngoc Do
No ratings yet
Thesis Bich Ngoc Do
72 pages
2A739 Liu y Structural Event Detection For Rich Transcription of S
No ratings yet
2A739 Liu y Structural Event Detection For Rich Transcription of S
253 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Effects of Storage Temperature On Post-Harvest of Potato by Bikash Khanal & Dipti Uprety
No ratings yet
Effects of Storage Temperature On Post-Harvest of Potato by Bikash Khanal & Dipti Uprety
7 pages
PHD Thesis Deep Learning For Automatic Assessment and Feedback of Spoken English
No ratings yet
PHD Thesis Deep Learning For Automatic Assessment and Feedback of Spoken English
282 pages
Character Master Sheets - V2
No ratings yet
Character Master Sheets - V2
103 pages
Product Design and Development - Design For Manufacturing
No ratings yet
Product Design and Development - Design For Manufacturing
35 pages
GR Power 145kv-800amps-Isolator-Without-Earth-Switch-Of-Make-Ms-Gr
No ratings yet
GR Power 145kv-800amps-Isolator-Without-Earth-Switch-Of-Make-Ms-Gr
9 pages
「ほんまや！」
No ratings yet
「ほんまや！」
4 pages
Voice Recognition System Using Machine L
No ratings yet
Voice Recognition System Using Machine L
7 pages
Representation Analysis Methods - For Translation
No ratings yet
Representation Analysis Methods - For Translation
218 pages
Data-Driven Neural Network Based Feature - Phd-Thesis
No ratings yet
Data-Driven Neural Network Based Feature - Phd-Thesis
155 pages
Naked Power-The Phallus As An Apotropaic Symbol in The Images and Texts of Roman Italy
100% (1)
Naked Power-The Phallus As An Apotropaic Symbol in The Images and Texts of Roman Italy
132 pages
1 Paper
No ratings yet
1 Paper
9 pages
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
No ratings yet
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
17 pages
Urban Renewal and Gentrification in Berlin, Germany - IB Geography Case Study
100% (1)
Urban Renewal and Gentrification in Berlin, Germany - IB Geography Case Study
2 pages
Strategic Analysis of WALMART - Group-4
No ratings yet
Strategic Analysis of WALMART - Group-4
12 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
Exploring Machine Learning Perspectives For Electroglottographic Signals (2023) Minh Châu NGUYÊN Livrable CLD2025
No ratings yet
Exploring Machine Learning Perspectives For Electroglottographic Signals (2023) Minh Châu NGUYÊN Livrable CLD2025
47 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
BTP Thesis rs1 End-To-End-Asr
No ratings yet
BTP Thesis rs1 End-To-End-Asr
51 pages
Brief Structure of Contemporary Dance History
No ratings yet
Brief Structure of Contemporary Dance History
7 pages
Displaypdf PDF
No ratings yet
Displaypdf PDF
2 pages
Survey of Deep Learning Paradigms For Speech Processing
No ratings yet
Survey of Deep Learning Paradigms For Speech Processing
37 pages
Lung Reviewer Pathology
No ratings yet
Lung Reviewer Pathology
9 pages
103 359 1 PB
No ratings yet
103 359 1 PB
6 pages
Applsci 13 05389
No ratings yet
Applsci 13 05389
2 pages
Los Microsatelites STRs Marcadores Moleculares de
No ratings yet
Los Microsatelites STRs Marcadores Moleculares de
14 pages
Ivandic Odyssey 2022
No ratings yet
Ivandic Odyssey 2022
1,208 pages
"Santería," The Lucumi Way (Inglés) (Artículo) Autor Harvard University
No ratings yet
"Santería," The Lucumi Way (Inglés) (Artículo) Autor Harvard University
3 pages
A Review of Deep Learning Techniques For Speech Processing
No ratings yet
A Review of Deep Learning Techniques For Speech Processing
111 pages
A Topology of The Sensible: Michel Serres' The Five Senses: Are You Still Here?
No ratings yet
A Topology of The Sensible: Michel Serres' The Five Senses: Are You Still Here?
7 pages
5 Ways To Improve User Experience
No ratings yet
5 Ways To Improve User Experience
10 pages
Gearmax Ep: Gearmax Ep 680 Es Una Mezcla Con Aceites Y Aditivos de Base
No ratings yet
Gearmax Ep: Gearmax Ep 680 Es Una Mezcla Con Aceites Y Aditivos de Base
2 pages
Mestrado-Engenharia Informatica-Eduardo Farofia Medeiros
No ratings yet
Mestrado-Engenharia Informatica-Eduardo Farofia Medeiros
103 pages
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
No ratings yet
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
33 pages
Project in A.PE.H: Submitted By: CJ Demanarig:D
No ratings yet
Project in A.PE.H: Submitted By: CJ Demanarig:D
4 pages
Lpu Dia Ol 2024-2025 540301
No ratings yet
Lpu Dia Ol 2024-2025 540301
6 pages
A Level Media Studies Statement of Intent Form Ocr
No ratings yet
A Level Media Studies Statement of Intent Form Ocr
3 pages
Marginal Costing
No ratings yet
Marginal Costing
12 pages
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
No ratings yet
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
7 pages
Fabula Ultima - Tinkerer
No ratings yet
Fabula Ultima - Tinkerer
6 pages
vt0228 English
No ratings yet
vt0228 English
38 pages
Staff Profile C.ajitHA
No ratings yet
Staff Profile C.ajitHA
10 pages
Gop 12 96
No ratings yet
Gop 12 96
2 pages
2021 - FuzzyGCP - A Deep Learning Architecture For Automatic Spoken Language
No ratings yet
2021 - FuzzyGCP - A Deep Learning Architecture For Automatic Spoken Language
14 pages
NM03 Act.3
No ratings yet
NM03 Act.3
2 pages
Final Deepfake Voice Detection Report
No ratings yet
Final Deepfake Voice Detection Report
36 pages
Mini Project (PPT) ... Last
No ratings yet
Mini Project (PPT) ... Last
19 pages
International Journal of Cognitive Computing in Engineering: Harsh Ahlawat, Naveen Aggarwal, Deepti Gupta
No ratings yet
International Journal of Cognitive Computing in Engineering: Harsh Ahlawat, Naveen Aggarwal, Deepti Gupta
37 pages
Seminar Report Final
No ratings yet
Seminar Report Final
37 pages
Seminar Report Parthiv
No ratings yet
Seminar Report Parthiv
58 pages
Vehicle Auxiliary Circuits
No ratings yet
Vehicle Auxiliary Circuits
8 pages
Julian David Echeverry Correa
No ratings yet
Julian David Echeverry Correa
161 pages
Deeplearninginspeech
No ratings yet
Deeplearninginspeech
4 pages
Final Review - Kannada Accent Recognition
No ratings yet
Final Review - Kannada Accent Recognition
27 pages
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
From Everand
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
daniel Huston
No ratings yet
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
From Everand
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
Adam Jones
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Conceptual Dependency Theory: Fundamentals and Applications
From Everand
Conceptual Dependency Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning
From Everand
Deep Learning
Manish Soni
No ratings yet
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
From Everand
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Practical Kaldi for Speech Recognition: The Complete Guide for Developers and Engineers
From Everand
Practical Kaldi for Speech Recognition: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
From Everand
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
KenLM: Efficient Language Modeling in Practice
From Everand
KenLM: Efficient Language Modeling in Practice
William Smith
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

6690 6215 1 PB

Uploaded by

6690 6215 1 PB

Uploaded by

Procesamiento del Lenguaje Natural, Revista nº 74, marzo de 2025, pp.

395-398 recibido 04-12-2024 revisado 21-01-2025 aceptado 31-01-2025

Deep learning applied to speech processing:

1 Introduction racy and robustness on specific tasks. Moreo-

YouSpeakers204. CNN-based class-discriminative localization

References Veaux, C., J. Yamagishi, K. MacDonald, y

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.