


default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 32
Volume 32, 2024
- Jin Chu Wu
, Raghu N. Kacker
:
Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions. 1-14 - Yongwei Zhou
, Junwei Bao
, Youzheng Wu, Xiaodong He
, Tiejun Zhao
:
Operation-Augmented Numerical Reasoning for Question Answering. 15-28 - Anurenjan Purushothaman
, Debottam Dutta
, Rohit Kumar, Sriram Ganapathy
:
Speech Dereverberation With Frequency Domain Autoregressive Modeling. 29-38 - Leyuan Qu
, Taihao Li
, Cornelius Weber
, Theresa Pekarek-Rosin
, Fuji Ren
, Stefan Wermter
:
Disentangling Prosody Representations With Unsupervised Speech Reconstruction. 39-54 - Mathias Bach Pedersen
, Søren Holdt Jensen
, Zheng-Hua Tan
, Jesper Jensen
:
Data-Driven Non-Intrusive Speech Intelligibility Prediction Using Speech Presence Probability. 55-67 - Yuanbo Hou
, Bo Kang
, Andrew Mitchell
, Wenwu Wang
, Jian Kang
, Dick Botteldooren
:
Cooperative Scene-Event Modelling for Acoustic Scene Classification. 68-82 - Xiaotong Jiang
, Peiwen You
, Chen Chen
, Zhongqing Wang
, Guodong Zhou
:
Exploring Scope Detection for Aspect-Based Sentiment Analysis. 83-94 - Xuenan Xu
, Zeyu Xie
, Mengyue Wu
, Kai Yu
:
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning. 95-112 - Federico Miotello
, Mirco Pezzoli
, Luca Comanducci
, Fabio Antonacci
, Augusto Sarti
:
Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks. 113-123 - Cristian Lucian Stanciu
, Jacob Benesty
, Constantin Paleologu
, Ruxandra-Liana Costea
, Laura-Maria Dogariu
, Silviu Ciochina
:
Decomposition-Based Wiener Filter Using the Kronecker Product and Conjugate Gradient Method. 124-138 - Huiyao Chen
, Yueheng Sun
, Meishan Zhang
, Min Zhang
:
Automatic Noise Generation and Reduction for Text Classification. 139-150 - Jiaming Xu
, Jian Cui
, Yunzhe Hao
, Bo Xu
:
Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments. 151-163 - Yang Xiang
, Jesper Lisby Højvang
, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen
:
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training. 164-177 - Xiao Li
, Ruirui Liu, Huichou Huang, Qingyao Wu
:
Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion. 178-188 - Xiaobo Liang
, Runze Mao
, Lijun Wu
, Juntao Li
, Min Zhang
, Qing Li
:
Enhancing Low-Resource NLP by Consistency Training With Data and Model Perturbations. 189-199 - Haisheng Lu
, Jiangnan Liang
, Chuang Shi
:
Comments on "Primary-Ambient Extraction Using Ambient Spectrum Estimation for Immersive Spatial Audio Reproduction". 200-202 - Szymon Drgas
, Lars Bramsløw
, Archontis Politis
, Gaurav Naithani
, Tuomas Virtanen
:
Dynamic Processing Neural Network Architecture for Hearing Loss Compensation. 203-214 - Femke B. Gelderblom
, Tron V. Tronstad
, Torbjørn Svendsen
, Tor André Myrvoll
:
On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks. 215-226 - Thomas Haubner
, Andreas Brendel
, Walter Kellermann
:
End-to-End Deep Learning-Based Adaptation Control for Linear Acoustic Echo Cancellation. 227-238 - Congcong Jiang
, Tieyun Qian
, Bing Liu
:
One General Teacher for Multi-Data Multi-Task: A New Knowledge Distillation Framework for Discourse Relation Analysis. 239-249 - Khandokar Md. Nayem
, Donald S. Williamson
:
Attention-Based Speech Enhancement Using Human Quality Perception Modeling. 250-260 - Ying Zhang
, Fandong Meng
, Yufeng Chen
, Jinan Xu
, Jie Zhou
:
Complex Question Enhanced Transfer Learning for Zero-Shot Joint Information Extraction. 261-275 - Jingsong Yan
, Piji Li
, Haibin Chen
, Junhao Zheng
, Qianli Ma
:
Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text Classification. 276-285 - Georgios Paraskevopoulos
, Theodoros Kouzelis
, Georgios Rouvalis
, Athanasios Katsamanis
, Vassilis Katsouros
, Alexandros Potamianos
:
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems: A Case Study for Modern Greek. 286-299 - Ernesto Accolti
, Javier Gimenez
, Michael Vorländer
:
Uncertainties of Room Acoustics Simulation Due to Directivity Data of Musical Instruments. 300-309 - Yoshiki Masuyama
, Kouei Yamaoka
, Yuma Kinoshita
, Taishi Nakashima
, Nobutaka Ono
:
Causal and Relaxed-Distortionless Response Beamforming for Online Target Source Extraction. 310-324 - Rohit Prabhavalkar
, Takaaki Hori
, Tara N. Sainath
, Ralf Schlüter
, Shinji Watanabe
:
End-to-End Speech Recognition: A Survey. 325-351 - Yun Zhao
, Dexi Liu
, Changxuan Wan
, Xiping Liu
, Jian-Yun Nie
, Jiaming Liu
:
JMS-QA: A Joint Hierarchical Architecture for Mental Health Question Answering. 352-363 - Shiwen Ni
, Jiawen Li
, Min Yang
, Hung-Yu Kao
:
DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding. 364-373 - Tiantian Zhu
, Yang Qin
, Ming Feng
, Qingcai Chen
, Baotian Hu
, Yang Xiang
:
BioPRO: Context-Infused Prompt Learning for Biomedical Entity Linking. 374-385 - Jiapu Wang
, Boyue Wang
, Junbin Gao
, Simin Hu
, Yongli Hu
, Baocai Yin
:
Multi-Level Interaction Based Knowledge Graph Completion. 386-396 - Qiangqiang Zhang
, Dongyuan Lin
, Yingying Xiao
, Yunfei Zheng
, Shiyuan Wang
:
Error Reused Filtered-X Least Mean Square Algorithm for Active Noise Control. 397-412 - Zengrui Jin
, Mengzhe Geng
, Jiajun Deng
, Tianzi Wang
, Shujie Hu
, Guinan Li
, Xunying Liu
:
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition. 413-429 - Jun Kong
, Jin Wang
, Xuejie Zhang
:
Adaptive Ensemble Self-Distillation With Consistent Gradients for Fast Inference of Pretrained Language Models. 430-442 - Srdan Kitic
, Jérôme Daniel
:
Blind Identification of Ambisonic Reduced Room Impulse Response. 443-458 - Qijie Shao
, Pengcheng Guo
, Jinghao Yan
, Pengfei Hu
, Lei Xie
:
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition. 459-470 - Han Zhu
, Gaofeng Cheng
, Jindong Wang
, Wenxin Hou
, Pengyuan Zhang
, Yonghong Yan
:
Boosting Cross-Domain Speech Recognition With Self-Supervision. 471-485 - Yile Wang
, Yue Zhang
, Peng Li
, Yang Liu
:
Gradual Syntactic Label Replacement for Language Model Pre-Training. 486-496 - Penghui Ma
, Jianfeng Li
, Jingjing Pan
, Xiaofei Zhang
, Roberto Gil-Pita
:
Coherent Signal DOA Estimation With Coprime Array: Exploiting Signal Subspace Reconstructing Strategy. 497-508 - Emma Hamel
, Nickvash Kani
:
Factors That Influence Automatic Recognition of African-American Vernacular English in Machine-Learning Models. 509-516 - Jingbei Li
, Sipan Li
, Ping Chen
, Luwen Zhang
, Yi Meng
, Zhiyong Wu
, Helen Meng
, Qiao Tian
, Yuping Wang, Yuxuan Wang:
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing. 517-528 - Bing Han
, Zhengyang Chen
, Yanmin Qian
:
Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification. 529-541 - Kristina Tesch
, Timo Gerkmann
:
Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters. 542-553 - Hao-Chen Pei
, Hao Fang
, Xin Luo
, Xin-Shun Xu
:
Gradformer: A Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment. 554-563 - Garima Sharma
, Karthikeyan Umapathy
, Sridhar Krishnan
:
Time-Frequency Scattergrams for Biomedical Audio Signal Representation and Classification. 564-576 - Zhibo Man
, Zengcheng Huang
, Yujie Zhang
, Yu Li
, Yuanmeng Chen
, Yufeng Chen
, Jinan Xu
:
WDSRL: Multi-Domain Neural Machine Translation With Word-Level Domain-Sensitive Representation Learning. 577-590 - Chin-Po Chen
, Ho-Hsien Pan
, Susan Shur-Fen Gau
, Chi-Chun Lee
:
Using Measures of Vowel Space for Autistic Traits Characterization. 591-607 - Kevin Wilkinghoff
, Frank Kurth
:
Why Do Angular Margin Losses Work Well for Semi-Supervised Anomalous Sound Detection? 608-622 - Aku Rouhe
, Tamás Grósz
, Mikko Kurimo
:
Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale. 623-638 - Yile Wang
, Yue Zhang
:
Lost in Context? On the Sense-Wise Variance of Contextualized Word Embeddings. 639-650 - Christoph Hold
, Ville Pulkki
, Archontis Politis
, Leo McCormack
:
Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding. 651-665 - Shouhui Wang
, Biao Qin
:
A Novel Joint Training Model for Knowledge Base Question Answering. 666-679 - Songbin Li
, Jingang Wang
, Peng Liu
, Ke Shi
:
SANet: A Compressed Speech Encoder and Steganography Algorithm Independent Steganalysis Deep Neural Network. 680-690 - Tarek Kanan
, Amani AbedAlghafer
, Shadi AlZu'bi
, Bilal Hawashin
, Ala Mughaid
, Ghassan Kanaan
, M. M. Kamruzzaman
:
An Intelligent Health Care System for Detecting Drug Abuse in Social Media Platforms Based on Low Resource Language. 691-703 - Alejandro Santorum Varela
, Svetlana Stoyanchev, Simon Keizer
, Rama Doddipatla
, Kate M. Knill
:
Entity Resolution in Situated Dialog With Unimodal and Multimodal Transformers. 704-713 - Huang He, Hua Lu
, Siqi Bao
, Fan Wang, Hua Wu, Zheng-Yu Niu, Haifeng Wang
:
Learning to Select External Knowledge With Multi-Scale Negative Sampling. 714-720 - Hua Lu
, Zhen Guo, Chanjuan Li, Yunyi Yang, Huang He, Siqi Bao
:
Towards Building an Open-Domain Dialogue System Incorporated With Internet Memes. 721-726 - Jungwoo Lim
, Taesun Whang, Dongyub Lee, Heuiseok Lim
:
Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations. 727-732 - David Thulke
, Nico Daheim, Christian Dugast, Hermann Ney:
Task-Oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10. 733-741 - Han Wu
, Kun Xu
, Linqi Song
:
Structure-Aware Dialogue Modeling Methods for Conversational Semantic Role Labeling. 742-752 - Zhe Chen
, Hongcheng Liu
, Yu Wang
:
DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog. 753-764 - Koichiro Yoshino
, Yun-Nung Chen
, Paul A. Crook
, Satwik Kottur, Jinchao Li, Behnam Hedayatnia, Seungwhan Moon, Zhengcong Fei, Zekang Li
, Jinchao Zhang, Yang Feng
, Jie Zhou
, Seokhwan Kim
, Yang Liu, Di Jin
, Alexandros Papangelis, Karthik Gopalakrishnan, Dilek Hakkani-Tur
, Babak Damavandi, Alborz Geramifard, Chiori Hori
, Ankit Shah, Chen Zhang, Haizhou Li
, João Sedoc, Luis F. D'Haro
, Rafael E. Banchs, Alexander Rudnicky
:
Overview of the Tenth Dialog System Technology Challenge: DSTC10. 765-778 - Shekhar Kumar Yadav
, Nithin V. George
:
Joint Dereverberation and Beamforming With Blind Estimation of the Shape Parameter of the Desired Source Prior. 779-793 - Yanxiong Li
, Zhongjie Jiang
, Qisheng Huang
, Wenchang Cao
, Jialong Li
:
Lightweight Speaker Verification Using Transformation Module With Feature Partition and Fusion. 794-806 - Yuhan Dai
, Zhirui Zhang
, Yichao Du
, Shengcai Liu
, Lemao Liu
, Tong Xu
:
Datastore Distillation for Nearest Neighbor Machine Translation. 807-817 - Changtao Li
, Feiran Yang
, Jun Yang
:
A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech. 818-829 - Jie Zhou
, Yuanbiao Lin
, Qin Chen
, Qi Zhang
, Xuanjing Huang
, Liang He
:
CausalABSC: Causal Inference for Aspect Debiasing in Aspect-Based Sentiment Classification. 830-840 - Ruiying Lu
, Bo Chen
, Dandan Guo
, Dongsheng Wang
, Mingyuan Zhou
:
Hierarchical Topic-Aware Contextualized Transformers. 841-852 - Yaru Zhao
, Bo Cheng
, Yakun Huang
, Zhiguo Wan
:
FluGCF: A Fluent Dialogue Generation Model With Coherent Concept Entity Flow. 853-867 - Changhao Ding
, Zhangjie Fu
, Zhongliang Yang
, Qi Yu
, Daqiu Li
, Yongfeng Huang
:
Context-Aware Linguistic Steganography Model Based on Neural Machine Translation. 868-878 - Zainab Alhakeem
, Se-In Jang
, Hong-Goo Kang
:
Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification. 879-890 - Jae-Hong Lee
, Joon-Hyuk Chang
:
Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-Labels for Self-Supervised ASR. 891-905 - Ryo Fukuda
, Katsuhito Sudoh
, Satoshi Nakamura
:
Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation. 906-916 - Seong-Gyun Leem
, Daniel Fulford
, Jukka-Pekka Onnela
, David Gard
, Carlos Busso
:
Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech. 917-929 - Alexander Bohlender
, Ann Spriet
, Wouter Tirry, Nilesh Madhu
:
Spatially Selective Speaker Separation Using a DNN With a Location Dependent Feature Extraction. 930-945 - Matan Karo
, Arie Yeredor
, Itshak Lapidot
:
Compact Time-Domain Representation for Logical Access Spoofed Audio. 946-958 - Or Berebi
, Zamir Ben-Hur
, David Lou Alon, Boaz Rafaely
:
Analysis and Design of Head-Tracked Compensation for Bilateral Ambisonics. 959-972 - Wei Wang
, Yanmin Qian
:
Universal Cross-Lingual Data Generation for Low Resource ASR. 973-983 - Davide Berghi
, Philip J. B. Jackson
:
Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization. 984-995 - Daniel Aleksander Krause
, Guillermo García-Barrios
, Archontis Politis
, Annamaria Mesaros
:
Binaural Sound Source Distance Estimation and Localization for a Moving Listener. 996-1011 - Seung-Bin Kim
, Sang-Hoon Lee
, Ha-Yeong Choi
, Seong-Whan Lee
:
Audio Super-Resolution With Robust Speech Representation Learning of Masked Autoencoder. 1012-1022 - Omer Musa Battal
, Aykut Koç
:
Automatic Construction of Sememe Knowledge Bases From Machine Readable Dictionaries. 1023-1035 - Varun Krishna
, Tarun Sai
, Sriram Ganapathy
:
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications. 1036-1047 - Zhengding Luo
, Dongyuan Shi
, Woon-Seng Gan
, Qirui Huang
:
Delayless Generative Fixed-Filter Active Noise Control Based on Deep Learning and Bayesian Filter. 1048-1060 - Zewen Chi
, Heyan Huang
, Luyang Liu, Yu Bai, Xiaoyan Gao, Xian-Ling Mao
:
Can Pretrained English Language Models Benefit Non-English NLP Systems in Low-Resource Scenarios? 1061-1074 - Rui Liu
, Yifan Hu
, Haolin Zuo
, Zhaojie Luo
, Longbiao Wang
, Guanglai Gao
:
Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training. 1075-1087 - Shu Jiang
, Zuchao Li
, Hai Zhao
, Weiping Ding
:
Entity-Relation Extraction as Full Shallow Semantic Dependency Parsing. 1088-1099 - Yoav Vered
, Stephen J. Elliott:
A Parallel Analog and Digital Adaptive Feedforward Controller for Active Noise Control. 1100-1108 - Puning Zhang
, Rongjian Zhao
, Boran Yang
, Yuexian Li, Zhigang Yang
:
Integrated Syntactic and Semantic Tree for Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network. 1109-1124 - Xu Wang
, Hainan Zhang
, Shuai Zhao
, Hongshen Chen
, Zhuoye Ding
, Zhiguo Wan
, Bo Cheng
, Yanyan Lan
:
Debiasing Counterfactual Context With Causal Inference for Multi-Turn Dialogue Reasoning. 1125-1132 - Hoang Ngoc Chau
, Tien Dat Bui
, Huu Binh Nguyen
, Thanh Thi Hien Duong
, Quoc-Cuong Nguyen
:
A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks. 1133-1144 - Yuchen Hu
, Chen Chen
, Qiushi Zhu
, Eng Siong Chng
:
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR. 1145-1156 - Tetsuya Ueda
, Tomohiro Nakatani
, Rintaro Ikeshita
, Keisuke Kinoshita
, Shoko Araki
, Shoji Makino
:
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction. 1157-1172 - Vibhav Agarwal
, Sourav Ghosh
, Harichandana B. S. S
, Himanshu Arora
, Barath Raj Kandur Raja
:
TrICy: Trigger-Guided Data-to-Text Generation With Intent Aware Attention-Copy. 1173-1184 - Christoph Böddeker
, Aswin Shanmugam Subramanian
, Gordon Wichern
, Reinhold Haeb-Umbach
, Jonathan Le Roux
:
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings. 1185-1197 - Reza Varzandeh
, Simon Doclo
, Volker Hohmann
:
Speech-Aware Binaural DOA Estimation Utilizing Periodicity and Spatial Features in Convolutional Neural Networks. 1198-1213 - Yigitcan Özer
, Meinard Müller
:
Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques. 1214-1225 - Lior Frenkel
, Shlomo E. Chazan
, Jacob Goldberger
:
Domain Adaptation Using Suitable Pseudo Labels for Speech Enhancement and Dereverberation. 1226-1236 - Jiahao Zhao
, Wenji Mao
, Daniel Dajun Zeng
:
Disentangled Text Representation Learning With Information-Theoretic Perspective for Adversarial Robustness. 1237-1247 - Dong Zhou
, Fang Lei
, Lin Li
, Yongmei Zhou
, Aimin Yang
:
Cross-Modal Interaction via Reinforcement Feedback for Audio-Lyrics Retrieval. 1248-1260 - Xuechen Liu
, Md. Sahidullah
, Kong Aik Lee
, Tomi Kinnunen
:
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space. 1261-1273 - Shiyao Cui
, Jiangxia Cao
, Xin Cong
, Jiawei Sheng
, Quangang Li
, Tingwen Liu
, Jinqiao Shi
:
Enhancing Multimodal Entity and Relation Extraction With Variational Information Bottleneck. 1274-1285 - Yizhou Tan
, Haojun Ai
, Shengchen Li
, Mark D. Plumbley
:
Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement. 1286-1297 - Orel Ben Zaken
, Anurag Kumar, Vladimir Tourbabin
, Boaz Rafaely
:
Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information. 1298-1309 - Changsheng Quan
, Xiaofei Li
:
SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation. 1310-1323 - Matthew Baas
, Herman Kamper
:
Disentanglement in a GAN for Unconditional Speech Synthesis. 1324-1335 - Xian Li
, Nian Shao
, Xiaofei Li
:
Self-Supervised Audio Teacher-Student Transformer for Both Clip-Level and Frame-Level Tasks. 1336-1351 - Yifan Chen, Gaofeng Cheng
, Runyan Yang
, Pengyuan Zhang
, Yonghong Yan
:
Interrelate Training and Clustering for Online Speaker Diarization. 1352-1364 - Sheng Feng
, Xiaoqian Zhu
, Shuqing Ma
:
Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning. 1365-1379 - Yangyang Zhao
, Kai Yin
, Zhenyu Wang
, Mehdi Dastani
, Shihan Wang
:
Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning. 1380-1391 - Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi
, Gaël Richard
, Florence d'Alché-Buc
:
Tackling Interpretability in Audio Classification Networks With Non-negative Matrix Factorization. 1392-1405 - Xiuying Chen
, Shen Gao
, Mingzhe Li
, Qingqing Zhu
, Xin Gao
, Xiangliang Zhang
:
Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization. 1406-1415 - Changkai Lin
, Hongju Cheng
, Qiang Rao
, Yang Yang
:
M$^{3}$SA: Multimodal Sentiment Analysis Based on Multi-Scale Feature Extraction and Multi-Task Learning. 1416-1429 - Ritujoy Biswas
, Karan Nathwani
, Vinayak Abrol
:
Statistically Guided Near-End Speech Intelligibility Improvement Through Voice Transformation and Transfer Learning. 1445-1456 - Linhui Sun
, Shuo Yuan
, Aifei Gong
, Lei Ye
, Eng Siong Chng
:
Dual-Branch Modeling Based on State-Space Model for Speech Enhancement. 1457-1467 - Alkis Koudounas
, Eliana Pastor
, Giuseppe Attanasio
, Vittorio Mazzia
, Manuel Giollo
, Thomas Gueudré
, Elisa Reale
, Luca Cagliero
, Sandro Cumani
, Luca de Alfaro
, Elena Baralis
, Daniele Amberti
:
Towards Comprehensive Subgroup Performance Analysis in Speech Models. 1468-1480 - Wenmeng Xiong
, Changchun Bao
, Jing Zhou
, Maoshen Jia
, José Picheral
:
Joint DOA Estimation and Dereverberation Based on Multi-Channel Linear Prediction Filtering and Azimuth Sparsity. 1481-1493 - Rui-Chen Zheng
, Yang Ai
, Zhen-Hua Ling
:
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement. 1430-1444 - Yehav Alkaher
, Israel Cohen
:
Howling Detection and Gain Control for Speech Reinforcement in a Noisy Car Cabin Environment. 1494-1505 - Xinfa Zhu
, Yi Lei
, Tao Li
, Yongmao Zhang
, Hongbin Zhou
, Heng Lu
, Lei Xie
:
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer. 1506-1518 - Myeonghun Jeong
, Minchan Kim
, Byoung Jin Choi
, Jaesam Yoon
, Won Jang
, Nam Soo Kim
:
Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech. 1519-1530 - Jiadi Yao, Hong Luo, Jun Qi
, Xiao-Lei Zhang
:
Interpretable Spectrum Transformation Attacks to Speaker Recognition Systems. 1531-1545 - Xiang Chen
, Lei Li
, Yuqi Zhu
, Shumin Deng
, Chuanqi Tan
, Fei Huang
, Luo Si, Ningyu Zhang
, Huajun Chen:
Sequence Labeling as Non-Autoregressive Dual-Query Set Generation. 1546-1558 - Lei Liu, Li Liu, Haizhou Li
:
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition. 1559-1572 - Adrián Barahona-Ríos
, Tom Collins
:
NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks. 1573-1585 - Siyuan Wang
, Zhongyu Wei
, Jiarong Xu
, Taishan Li
, Zhihao Fan
:
Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks. 1586-1595 - Yijing Chu
, Sipei Zhao
, Feng Niu, Yongzheng Dong, Yuezhe Zhao:
A New Diffusion Filtered-X Affine Projection Algorithm: Performance Analysis and Application in Windy Environment. 1596-1608 - Yuquan Le
, Zhe Quan
, Jiawei Wang
, Da Cao
, Kenli Li
:
$\boldsymbol{R}^{2}$: A Novel Recall & Ranking Framework for Legal Judgment Prediction. 1609-1622 - Xiaotong Jiang
, Ruirui Bai
, Zhongqing Wang
, Guodong Zhou
:
Cross-Domain Aspect-Based Sentiment Classification With Tripartite Graph Modeling. 1623-1635 - Zhengyang Chen
, Bing Han
, Shuai Wang
, Yanmin Qian
:
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer. 1636-1649 - Chenfeng Miao
, Qingying Zhu
, Minchuan Chen
, Jun Ma
, Shaojun Wang
, Jing Xiao
:
EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion. 1650-1661 - Orel Peretz
, Israel Cohen
:
Constant Elevation-Beamwidth Beamforming With Concentric Ring Arrays. 1662-1672 - Zhibin Quan
, Chi-Man Vong
, Weili Zeng
, Wankou Yang
:
The MorPhEMe Machine: An Addressable Neural Memory for Learning Knowledge-Regularized Deep Contextualized Chinese Embedding. 1673-1686 - Lijian Gao
, Qirong Mao
, Ming Dong
:
On Local Temporal Embedding for Semi-Supervised Sound Event Detection. 1687-1698 - Xuehao Zhou
, Mingyang Zhang
, Yi Zhou
, Zhizheng Wu
, Haizhou Li
:
Accented Text-to-Speech Synthesis With Limited Data. 1699-1711 - Vinay Kothapally
, John H. L. Hansen
:
Monaural Speech Dereverberation Using Deformable Convolutional Networks. 1712-1723 - Taihui Wang
, Feiran Yang
, Jun Yang
:
Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors. 1724-1735 - Saurabh Kataria
, Jesús Villalba
, Laureano Moro-Velázquez
, Piotr Zelasko
, Najim Dehak
:
Time-Domain Speech Super-Resolution With GAN Based Modeling for Telephony Speaker Verification. 1736-1749 - Marco Olivieri
, Amy Bastine
, Mirco Pezzoli
, Fabio Antonacci
, Thushara D. Abhayapala
, Augusto Sarti
:
Acoustic Imaging With Circular Microphone Array: A New Approach for Sound Field Analysis. 1750-1761 - Tengfei Liu
, Yongli Hu
, Junbin Gao
, Yanfeng Sun
, Baocai Yin
:
Hierarchical Multi-Granularity Interaction Graph Convolutional Network for Long Document Classification. 1762-1775 - Etienne Thuillier
, Craig T. Jin
, Vesa Välimäki
:
HRTF Interpolation Using a Spherical Neural Process Meta-Learner. 1790-1802 - Xun Gong
, Yu Wu
, Jinyu Li
, Shujie Liu, Rui Zhao, Xie Chen
, Yanmin Qian
:
Advanced Long-Content Speech Recognition With Factorized Neural Transducer. 1803-1815 - Yoshiki Masuyama
, Kouei Yamaoka
, Takao Kawamura
, Nobutaka Ono
:
Efficient Joint Optimization of Sampling Rate Offsets Using Entire Multichannel Signal. 1816-1828 - Takaaki Saeki
, Soumi Maiti
, Xinjian Li
, Shinji Watanabe
, Shinnosuke Takamichi
, Hiroshi Saruwatari
:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. 1829-1844 - Douglas D. O'Shaughnessy
:
Review of Methods for Automatic Speaker Verification. 1776-1789 - Yingming Gao
, Peter Birkholz
, Ya Li
:
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks. 1845-1858 - Théo Mariotte
, Anthony Larcher
, Silvio Montrésor
, Jean-Hugh Thomas
:
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection. 1859-1872 - Luciana M. X. de Souza
, Márcio H. Costa
, Renata Coelho Borges
:
Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications. 1873-1884 - Linjian Li
, Yi Cai
, Xin Wu
:
Unsupervised Disentanglement Learning Model for Exemplar-Guided Paraphrase Generation. 1885-1900 - Amir Ivry
, Israel Cohen
, Baruch Berdugo:
A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk. 1901-1914 - Geng Zhang
, Jin Liu
, Guangyou Zhou
, Kunsong Zhao
, Zhiwen Xie
, Bo Huang
:
Question-Directed Reasoning With Relation-Aware Graph Attention Network for Complex Question Answering Over Knowledge Graph. 1915-1927 - Yu Yao
, Peng Yang
, Guangzhen Zhao
, Guoshun Yin
:
KGAgent: Learning a Deep Reinforced Agent for Keyphrase Generation. 1928-1940 - Jiahong Li
, Chenda Li
, Yifei Wu
, Yanmin Qian
:
Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond. 1941-1953 - Mieszko Fras
, Konrad Kowalczyk
:
Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors. 1954-1967 - Rui Wang
, Li Li
, Tomoki Toda
:
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information. 1968-1979 - Qinyu Han
, Zhihao Yang
, Hongfei Lin
, Tian Qin
:
Let Topic Flow: A Unified Topic-Guided Segment-Wise Dialogue Summarization Framework. 2021-2032 - Haonan Cheng
, Shulin Liu
, Zhicheng Lian
, Long Ye
, Qin Zhang
:
MusicECAN: An Automatic Denoising Network for Music Recordings With Efficient Channel Attention. 2033-2049 - Guy Gubnitky
, Roee Diamant
:
Detecting the Presence of Sperm Whales' Echolocation Clicks in Noisy Environments. 2050-2061 - Yuxia Wu
, Tianhao Dai
, Zhedong Zheng
, Lizi Liao
:
Active Discovering New Slots for Task-Oriented Conversation. 2062-2072 - Jacob Hollebon
, Filippo Maria Fazi
:
Dynamic Higher-Order Stereophony. 2073-2084 - Aidan O. T. Hogg
, Mads Jenkins, He Liu
, Isaac Squires
, Samuel J. Cooper
, Lorenzo Picinali
:
HRTF Upsampling With a Generative Adversarial Network Using a Gnomonic Equiangular Projection. 2085-2099 - Yusheng Liao
, Yanfeng Wang
, Yu Wang
:
Leveraging Diverse Modeling Contexts With Collaborating Learning for Neural Machine Translation. 2100-2111 - Shuo Li
, Xiaojun Bi
, Tao Liu
, Zheng Chen
:
Information Dropping Data Augmentation for Machine Translation Quality Estimation. 2112-2124 - Shuoran Jiang
, Qingcai Chen
, Yang Xiang
, Youcheng Pan
, Xiangping Wu
:
BaSFormer: A Balanced Sparsity Regularized Attention Network for Transformer. 2125-2140 - Morgan Buisson
, Brian McFee
, Slim Essid
, Hélène C. Crayencour:
Self-Supervised Learning of Multi-Level Audio Representations for Music Segmentation. 2141-2152 - Cong Ma
, Xu Han
, Linghui Wu
, Yaping Zhang
, Yang Zhao
, Yu Zhou
, Chengqing Zong
:
Modal Contrastive Learning Based End-to-End Text Image Machine Translation. 2153-2165 - Ruiyu Liang
, Yue Xie
, Jiaming Cheng
, Cong Pang
, Björn W. Schuller
:
A Non-Invasive Speech Quality Evaluation Algorithm for Hearing Aids With Multi-Head Self-Attention and Audiogram-Based Features. 2166-2176 - Ziqiang Zhang
, Sanyuan Chen
, Long Zhou
, Yu Wu
, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Li-Rong Dai
, Jinyu Li
, Furu Wei:
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data. 2177-2187 - Rui Liu
, Berrak Sisman
, Guanglai Gao
, Haizhou Li
:
Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering. 2188-2201 - Kshitij Mishra
, Mauajama Firdaus
, Asif Ekbal
:
Please Donate to Save a Life: Inducing Politeness to Handle Resistance in Persuasive Dialogue Agents. 2202-2212 - Hirokazu Kameoka
, Takuhiro Kaneko
, Kou Tanaka
, Nobukatsu Hojo, Shogo Seki
:
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics. 2213-2226 - Florian Schmid
, Khaled Koutini
, Gerhard Widmer
:
Dynamic Convolutional Neural Networks as Efficient Pre-Trained Audio Models. 2227-2241 - Michael Neri
, Archontis Politis
, Daniel Aleksander Krause
, Marco Carli
, Tuomas Virtanen
:
Speaker Distance Estimation in Enclosures From Single-Channel Audio. 2242-2254 - Triantafyllos Kefalas
, Yannis Panagakis
, Maja Pantic:
Large-Scale Unsupervised Audio Pre-Training for Video-to-Speech Synthesis. 2255-2268 - Ju-ho Kim
, Jungwoo Heo
, Hyun-seo Shin
, Chan-yeong Lim
, Ha-Jin Yu
:
FA-ExU-Net: The Simultaneous Training of an Embedding Extractor and Enhancement Model for a Speaker Verification System Robust to Short Noisy Utterances. 2269-2282 - Yang Ai
, Zhen-Hua Ling
:
Low-Latency Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks. 2283-2296 - Yanxiong Li
, Jialong Li
, Yongjie Si
, Jiaxin Tan
, Qianhua He
:
Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting. 2297-2311 - Tianchi Liu
, Kong Aik Lee
, Qiongqiong Wang
, Haizhou Li
:
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification. 2324-2337 - Lei Zhao
, Wenbo Zhu, Shengqiang Li, Hong Luo, Xiao-Lei Zhang
, Susanto Rahardja
:
Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation. 2338-2351 - Christian Geishauser
, Carel van Niekerk
, Nurul Lubis
, Hsien-Chin Lin
, Michael Heck
, Shutong Feng
, Benjamin Matthias Ruppik
, Renato Vukovic
, Milica Gasic
:
Learning With an Open Horizon in Ever-Changing Dialogue Circumstances. 2352-2366 - Yusuf Eren
, Buket Çolak Güvenç
, Engin Cemal Mengüç
:
Cost-Effective Acoustic Feedback Cancellers for Digital Hearing Aids. 2367-2377 - Jianchen Li
, Jiqing Han
, Fan Qian
, Tieran Zheng
, Yongjun He
, Guibin Zheng
:
Distance Metric-Based Open-Set Domain Adaptation for Speaker Verification. 2378-2390 - Daisuke Niizumi
, Daiki Takeuchi, Yasunori Ohishi
, Noboru Harada
, Kunio Kashino:
Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework. 2391-2406 - Guangzhi Sun
, Chao Zhang
, Philip C. Woodland
:
Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator. 2407-2417 - Bengt J. Borgström
, Michael S. Brandstein
:
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement. 2418-2431 - Kun Wei
, Bei Li
, Hang Lv
, Quan Lu, Ning Jiang
, Lei Xie
:
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation. 2432-2444 - Shang-Yu Su
, Yung-Sung Chung
, Yun-Nung Chen
:
Joint Dual Learning With Mutual Information Maximization for Natural Language Understanding and Generation in Dialogues. 2445-2452 - Cunhang Fan
, Mingming Ding
, Jianhua Tao
, Ruibo Fu
, Jiangyan Yi
, Zhengqi Wen, Zhao Lv
:
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection. 2453-2466 - Hassan Taherian
, DeLiang Wang
:
Multi-Channel Conversational Speaker Separation via Neural Diarization. 2467-2476 - Sherif Abdulatif
, Ruizhe Cao
, Bin Yang
:
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement. 2477-2493 - Puhai Yang
, Heyan Huang
, Shumin Shi
, Xian-Ling Mao
:
STN4DST: A Scalable Dialogue State Tracking Based on Slot Tagging Navigation. 2494-2507 - Hang Chen
, Qing Wang
, Jun Du
, Bao-Cai Yin
, Jia Pan
, Chin-Hui Lee
:
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition. 2508-2521 - Anderson Queiroz
, Rosângela Coelho
:
Harmonic Detection From Noisy Speech With Auditory Frame Gain for Intelligibility Enhancement. 2522-2531 - Maodi Hu
, Li Qian
, Zhijun Chang
, Zhixiong Zhang
:
KDPG-Enhanced MRC Framework for Scientific Entity Recognition in Survey Papers. 2532-2543 - Leanne Nortje
, Dan Oneata
, Herman Kamper
:
Visually Grounded Few-Shot Word Learning in Low-Resource Settings. 2544-2554 - Purnima Kamath
, Chitralekha Gupta
, Lonce Wyse
, Suranga Nanayakkara
:
Example-Based Framework for Perceptually Guided Audio Texture Generation. 2555-2565 - Arka Roy
, Udit Satija
:
A Novel Multi-Head Self-Organized Operational Neural Network Architecture for Chronic Obstructive Pulmonary Disease Detection Using Lung Sounds. 2566-2575 - Rongzhi Gu
, Yi Luo
:
ReZero: Region-Customizable Sound Extraction. 2576-2589 - Wenbin Wang
, Yang Song
, Sanjay K. Jha
:
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach. 2590-2604 - Han Han
, Vincent Lostanlen
, Mathieu Lagrange
:
Learning to Solve Inverse Problems for Perceptual Sound Matching. 2605-2615 - Nursadul Mamun
, John H. L. Hansen
:
Speech Enhancement for Cochlear Implant Recipients Using Deep Complex Convolution Transformer With Frequency Transformation. 2616-2629 - Cheng Peng
, Haobo Wang
, Jue Wang
, Lidan Shou
, Ke Chen
, Gang Chen
, Chang Yao
:
Learning Label-Adaptive Representation for Large-Scale Multi-Label Text Classification. 2630-2640 - Junchuan Zhao
, Low Qi Hong Chetwin
, Ye Wang
:
SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System. 2641-2653 - Hyung-Seok Oh
, Sang-Hoon Lee
, Seong-Whan Lee
:
DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training. 2654-2666 - Stefano Damiano
, Federico Borra
, Alberto Bernardini
, Fabio Antonacci
, Augusto Sarti
:
A Compressive Sensing Approach for the Reconstruction of the Soundfield Produced by Directive Sources in Reverberant Rooms. 2667-2679 - Jiaming Cheng
, Ruiyu Liang
, Lin Zhou
, Li Zhao, Chengwei Huang
, Björn W. Schuller
:
Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement. 2680-2691 - Shih-Lun Wu
, Chris Donahue
, Shinji Watanabe
, Nicholas J. Bryan
:
Music ControlNet: Multiple Time-Varying Controls for Music Generation. 2692-2703 - Youzhi Tu
, Man-Wai Mak
, Jen-Tzung Chien
:
Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement. 2704-2715 - Kavya Ranjan Saxena
, Vipul Arora
:
Interactive Singing Melody Extraction Based on Active Adaptation. 2729-2738 - Shuoran Jiang
, Youcheng Pan
, Qingcai Chen
, Yang Xiang
, Xiangping Wu
:
Learning to Improve Out-of-Distribution Generalization via Self-Adaptive Language Masking. 2739-2750 - Alexander Shirnin
, Nikita Andreev
, Sofia Potapova
, Ekaterina Artemova
:
Analyzing the Robustness of Vision & Language Models. 2751-2763 - Han Ding
, Linwei Zhai
, Cui Zhao
, Fei Wang
, Ge Wang
, Wei Xi
, Zhi Wang
, Jizhong Zhao
:
Genre Classification Empowered by Knowledge-Embedded Music Representation. 2764-2776 - Lester Phillip Violeta
, Ding Ma
, Wen-Chin Huang
, Tomoki Toda
:
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition. 2777-2789 - Yanjie Sun
, Kele Xu
, Chaorun Liu
, Yong Dou
, Huaimin Wang
, Bo Ding
, Qinghua Pan
:
Automated Data Augmentation for Audio Classification. 2716-2728 - Marc Arnela
, Oriol Guasch
:
Formant Frequency Tuning of Three-Dimensional MRI-Based Vocal Tracts for the Finite Element Synthesis of Vowels. 2790-2799 - Xiang Huang
, Hao Peng
, Dongcheng Zou
, Zhiwei Liu
, Jianxin Li
, Kay Liu
, Jia Wu
, Jianlin Su
, Philip S. Yu
:
CoSENT: Consistent Sentence Embedding via Similarity Ranking. 2800-2813 - Yangfu Li
, Jiapan Gan
, Xiaodan Lin
, Yingqiang Qiu
, Hongjian Zhan
, Hui Tian
:
DS-TDNN: Dual-Stream Time-Delay Neural Network With Global-Aware Filter for Speaker Verification. 2814-2827 - Hao Zhang
, Yixuan Zhang
, Meng Yu
, Dong Yu
:
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models. 2828-2840 - Zhuoyuan Mao
, Chenhui Chu
, Sadao Kurohashi
:
EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning. 2841-2856 - Ernst Seidel
, Pejman Mowlaee
, Tim Fingscheidt
:
Convergence and Performance Analysis of Classical, Hybrid, and Deep Acoustic Echo Control. 2857-2870 - Haohe Liu
, Yi Yuan
, Xubo Liu
, Xinhao Mei
, Qiuqiang Kong
, Qiao Tian
, Yuping Wang, Wenwu Wang
, Yuxuan Wang, Mark D. Plumbley
:
AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining. 2871-2883 - Shu-Wen Yang
, Heng-Jui Chang
, Zili Huang, Andy T. Liu
, Cheng-I Lai
, Haibin Wu
, Jiatong Shi
, Xuankai Chang, Hsiang-Sheng Tsai
, Wen-Chin Huang
, Tzu-hsun Feng, Po-Han Chi
, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe
, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. 2884-2899 - Liang Xu
, Xiaoxuan Bu
, Xuetao Tian
:
Dynamic Prompt-Driven Zero-Shot Relation Extraction. 2900-2912 - Dongchao Yang
, Songxiang Liu
, Rongjie Huang
, Chao Weng
, Helen Meng
:
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt. 2913-2925 - Zhichao Wang
, Liumeng Xue
, Qiuqiang Kong, Lei Xie
, Yuanzhe Chen, Qiao Tian
, Yuping Wang:
Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion. 2926-2937 - Youngjae Chang
, Youngjoong Ko
:
Two-Step Masked Language Model for Domain-Adapting Multi-Modal Task-Oriented Dialogue Systems. 2938-2943 - Jixun Yao
, Qing Wang
, Pengcheng Guo
, Ziqian Ning, Lei Xie
:
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix. 2944-2956 - Congcong Sun
, Hui Tian
, Peng Tian, Haizhou Li
, Zhenxing Qian:
Multi-Agent Deep Learning for the Detection of Multiple Speech Steganography Methods. 2957-2972 - Xianrui Wang
, Yichen Yang
, Andreas Brendel
, Tetsuya Ueda, Shoji Makino
, Jacob Benesty
, Walter Kellermann
, Jingdong Chen
:
On Semi-Blind Source Separation-Based Approaches to Nonlinear Echo Cancellation Based on Bilinear Alternating Optimization. 2973-2987 - Zhihua Fang
, Liang He
, Lin Li
, Ying Hu
:
Improving Speaker Verification With Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels. 2988-3001 - Yuanhang Zheng
, Zhixing Tan
, Peng Li
, Yang Liu
:
Black-Box Prompt Tuning With Subspace Learning. 3002-3013 - Shuai Zhao
, Luu Anh Tuan
, Jie Fu
, Jinming Wen
, Weiqi Luo
:
Exploring Clean Label Backdoor Attacks and Defense in Language Models. 3014-3024 - Zilu Guo
, Qing Wang
, Jun Du
, Jia Pan, Qing-Feng Liu, Chin-Hui Lee
:
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition. 3025-3038 - Ankita
, Shambhavi, S. Shahnawazuddin
:
Effect of Modeling Glottal Activity Parameters on Zero-Shot Children's ASR. 3039-3048 - Hao Shi
, Masato Mimura, Tatsuya Kawahara
:
Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition. 3049-3060 - Xiao Wei
, Yuhang Li, Yuke Si
, Longbiao Wang
, Xiaobao Wang
, Jianwu Dang
:
A Prompt-Based Hierarchical Pipeline for Cross-Domain Slot Filling. 3061-3075 - Xueqin Luo
, Jilu Jin
, Gongping Huang
, Jingdong Chen
, Jacob Benesty
:
Design of Fully Steerable Differential Beamformers With Linear Superarrays. 3076-3089 - Niels de Koeijer
, Martin Bo Møller
, Jorge Martínez
, Pablo Martínez-Nuevo
, Richard C. Hendriks
:
Block-Based Perceptually Adaptive Sound Zones With Reproduction Error Constraints. 3090-3100 - Weize Chen
, Xu Han
, Yankai Lin
, Kaichen He, Ruobing Xie
, Jie Zhou
, Zhiyuan Liu
, Maosong Sun:
Hyperbolic Pre-Trained Language Model. 3101-3112 - Jinghui Qin
, Zhongzhan Huang, Ying Zeng, Quanshi Zhang
, Liang Lin
:
An Introspective Data Augmentation Method for Training Math Word Problem Solvers. 3113-3127 - Wenqi Zhang
, Yongliang Shen
, Guiyang Hou
, Kuangyi Wang
, Weiming Lu
:
Specialized Mathematical Solving by a Step-By-Step Expression Chain Generation. 3128-3140 - Viktor Gunnarsson
:
Spectral Correction of Audio Objects in Stereophonic Rendering. 3141-3156 - Zijie Wang, Yegui Xiao
, Yaping Ma
, Liying Ma, Khashayar Khorasani
:
A New Hybrid Active Noise Control System With Input-Power-Controlled Online Secondary-Path Modeling. 3157-3170 - Tomoki Matsunaga
, Hiroaki Saito
:
Multi-Layer Combined Frequency and Periodicity Representations for Multi-Pitch Estimation of Multi-Instrument Music. 3171-3184 - Shuming Luan
, Yukoh Wakabayashi
, Tomoki Toda
:
Unequally Spaced Sound Field Interpolation for Rotation-Robust Beamforming. 3185-3199 - Liang Tao
, Maoshen Jia
, Changchun Bao
, Wenmeng Xiong
:
First-Order Relative Harmonic Coefficient-Based Time-Frequency Points Selection for Multi-Source DOA Estimation. 3200-3212 - Bolaji Yusuf
, Murat Saraçlar:
Written Term Detection Improves Spoken Term Detection. 3213-3223 - Mateusz Guzik
, Konrad Kowalczyk
:
On Ambisonic Source Separation With Spatially Informed Non-Negative Tensor Factorization. 3238-3255 - Yang Ai
, Xiao-Hang Jiang, Ye-Xin Lu
, Hui-Peng Du, Zhen-Hua Ling
:
APCodec: A Neural Audio Codec With Parallel Amplitude and Phase Spectrum Encoding and Decoding. 3256-3269 - Xuan Feng
, Tianlong Gu
, Liang Chang
, Xiaoli Liu
:
PROTECT: Parameter-Efficient Tuning for Few-Shot Robust Chinese Text Correction. 3270-3282 - Moti Lugasi
, Jacob Donley
, Anjali Menon, Vladimir Tourbabin
, Boaz Rafaely
:
Multi-Channel to Multi-Channel Noise Reduction and Reverberant Speech Preservation in Time-Varying Acoustic Scenes for Binaural Reproduction. 3283-3295 - Li Wang
, Lingyun Yu
, Yongdong Zhang
, Hongtao Xie
:
Generalizable Speech Spoofing Detection Against Silence Trimming With Data Augmentation and Multi-Task Meta-Learning. 3296-3310 - Xinhao Mei
, Xubo Liu
, Jianyuan Sun, Mark D. Plumbley
, Wenwu Wang
:
Towards Generating Diverse Audio Captions via Adversarial Training. 3311-3323 - Dong-Jiang Zhang, Wei-Tao Zhang
, Yuying Ma
, Zhen-Zhen Huang
:
Anti-Aliasing Speech DOA Estimation Under Spatial Aliasing Conditions. 3324-3338 - Xinhao Mei
, Chutong Meng
, Haohe Liu
, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley
, Yuexian Zou
, Wenwu Wang
:
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research. 3339-3354 - Xiaofei Wang
, Manthan Thakker, Zhuo Chen, Naoyuki Kanda
, Sefik Emre Eskimez, Sanyuan Chen
, Min Tang
, Shujie Liu, Jinyu Li
, Takuya Yoshioka
:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. 3355-3364 - Xincheng Yu
, Dongyue Guo
, Jianwei Zhang
, Yi Lin
:
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning. 3365-3378 - Gwendal Le Vaillant
, Thierry Dutoit:
Latent Space Interpolation of Synthesizer Parameters Using Timbre-Regularized Auto-Encoders. 3379-3392 - Jagabandhu Mishra
, S. R. Mahadeva Prasanna
:
Implicit Self-Supervised Language Representation for Spoken Language Diarization. 3393-3407 - Xiaoyi Qin, Na Li, Shufei Duan
, Ming Li
:
Investigating Long-Term and Short-Term Time-Varying Speaker Verification. 3408-3423 - Hao Gao
, Junlong Ren, Jiazheng Cheng
, Yong Shen
:
Optimal Modal Decomposition for Directionally Biased Sound Field Recording. 3424-3436 - Pijian Li
, Qingbao Huang
, Zhigang Li, Yi Cai
, Feng Shuang
, Qing Li
:
Multi-Granularity Feature Fusion for Image-Guided Story Ending Generation. 3437-3449 - Federico Landini
, Mireia Díez
, Themos Stafylakis
, Lukás Burget
:
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. 3450-3465 - Markéta Rezácková
, Daniel Tihelka
, Jindrich Matousek
:
T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion. 3466-3476 - Michele Panariello
, Natalia A. Tomashenko
, Xin Wang
, Xiaoxiao Miao, Pierre Champion
, Hubert Nourtel
, Massimiliano Todisco, Nicholas W. D. Evans
, Emmanuel Vincent
, Junichi Yamagishi
:
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation. 3477-3491 - Qi Wang
, Mingkuan Liu, Changchun Bao
, Maoshen Jia
:
Harmonic-Aware Frequency and Time Attention for Automatic Piano Transcription. 3492-3506 - Keqi Deng
, Philip C. Woodland
:
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition. 3507-3516 - Elisa Tengan
, Thomas Dietzen
, Filip Elvander
, Toon van Waterschoot
:
Multi-Source Direction-of-Arrival Estimation Using Steered Response Power and Group-Sparse Optimization. 3517-3531 - Danwei Cai
, Ming Li
:
Leveraging ASR Pretrained Conformers for Speaker Verification Through Transfer Learning and Knowledge Distillation. 3532-3545 - Dejan Porjazovski
, Tamás Grósz
, Mikko Kurimo
:
From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques. 3546-3560 - Shujie Hu
, Xurong Xie
, Mengzhe Geng
, Zengrui Jin
, Jiajun Deng
, Guinan Li
, Yi Wang, Mingyu Cui, Tianzi Wang
, Helen Meng
, Xunying Liu
:
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition. 3561-3575 - Jinmeng Wu
, Tingting Mu
, Jeyarajan Thiyagalingam
, Hanyu Hong
, Yanbin Hao
, Tianxu Zhang
, John Yannis Goulermas
:
Iterative Semantic Transformer by Greedy Distillation for Community Question Answering. 3576-3588 - Tsubasa Ochiai
, Kazuma Iwamoto, Marc Delcroix
, Rintaro Ikeshita
, Hiroshi Sato, Shoko Araki
, Shigeru Katagiri
:
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance. 3589-3602 - Ruiteng Zhang
, Jianguo Wei
, Xugang Lu
, Wenhuan Lu, Di Jin
, Lin Zhang
, Junhai Xu
:
Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport. 3603-3617 - Yuqing Li
, Xianke Wang
, Ruimin Wu, Wei Xu
, Wenqing Cheng
:
A Two-Stage Audio-Visual Fusion Piano Transcription Model Based on the Attention Mechanism. 3618-3630 - Yujia Qin
, Xiaozhi Wang
, Yusheng Su
, Yankai Lin
, Ning Ding
, Jing Yi
, Weize Chen
, Zhiyuan Liu
, Juanzi Li
, Lei Hou
, Peng Li
, Maosong Sun, Jie Zhou
:
Exploring Universal Intrinsic Task Subspace for Few-Shot Learning via Prompt Tuning. 3631-3643 - Ho-Lam Chung
, Ying-Hong Chan, Yao-Chung Fan
:
Handover QG: Question Generation by Decoder Fusion and Reinforcement Learning. 3644-3655 - Yinlong Xiao
, Zongcheng Ji
, Jianqiang Li
, Mei Han:
MVT: Chinese NER Using Multi-View Transformer. 3656-3668 - Ziqi Yuan
, Jingliang Fang
, Hua Xu
, Kai Gao
:
Multimodal Consistency-Based Teacher for Semi-Supervised Multimodal Sentiment Analysis. 3669-3683 - Sara Atito Ali Ahmed
, Muhammad Awais, Wenwu Wang
, Mark D. Plumbley
, Josef Kittler
:
ASiT: Local-Global Audio Spectrogram Vision Transformer for Event Classification. 3684-3693 - Xiachong Feng, Xiaocheng Feng, Xiyuan Du, Min-Yen Kan, Bing Qin:
Adapter-Based Selective Knowledge Distillation for Federated Multi-Domain Meeting Summarization. 3694-3708 - Tianrui Wang
, Long Zhou
, Ziqiang Zhang
, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li
, Furu Wei
:
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation. 3709-3716 - Angelo Cesar Mendes da Silva
, Diego Furtado Silva
, Ricardo Marcondes Marcacini
:
Artist Similarity Based on Heterogeneous Graph Neural Networks. 3717-3729 - Kai-Wei Chang
, Haibin Wu
, Yu-Kai Wang
, Yuan-Kuei Wu
, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-wen Li
, Hung-Yi Lee
:
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks. 3730-3744 - Matteo Scerbo
, Lauri Savioja
, Enzo De Sena
:
Room Acoustic Rendering Networks With Control of Scattering and Early Reflections. 3745-3758 - Yujie Wang
, Hu Zhang
, Jiye Liang
, Ru Li
:
Heterogeneous-Graph Reasoning With Context Paraphrase for Commonsense Question Answering. 3759-3770 - Bei Liu
, Haoyu Wang
, Yanmin Qian
:
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization. 3771-3784 - Etienne Labbé
, Thomas Pellegrini
, Julien Pinquier
:
CoNeTTE: An Efficient Audio Captioning System Leveraging Multiple Datasets With Task Embedding. 3785-3794 - Xue Yang
, Changchun Bao
, Xianhong Chen
:
Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation. 3795-3810 - Jinfu Wang
, Feiran Yang
, Xiaoqing Hu
, Jun Yang
:
Theoretical Analysis of Maclaurin Expansion Based Linear Differential Microphone Arrays and Improved Solutions. 3811-3825 - Syu-Siang Wang
, Jia-Yang Chen, Bo-Ren Bai
, Shih-Hau Fang
, Yu Tsao
:
Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics. 3826-3837 - Eike Jannik Nustede
, Jörn Anemüller
:
On the Generalization Ability of Complex-Valued Variational U-Networks for Single-Channel Speech Enhancement. 3838-3849 - Jaesung Huh
, Joon Son Chung
, Arsha Nagrani
, Andrew Brown, Jee-weon Jung
, Daniel Garcia-Romero, Andrew Zisserman
:
The VoxCeleb Speaker Recognition Challenge: A Retrospective. 3850-3866 - Nakamasa Inoue
, Shinta Otake, Takumi Hirose, Masanari Ohi, Rei Kawakami:
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks. 3867-3880 - Thomas Haubner
, Andreas Brendel
, Walter Kellermann:
Erratum to "End-to-End Deep Learning-Based Adaptation Control for Linear Acoustic Echo Cancellation". 3881 - Zhong-Qiu Wang
:
USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering. 3882-3895 - Sara Barahona
, Diego de Benito-Gorrón
, Doroteo T. Toledano
, Daniel Ramos
:
Enhancing Conformer-Based Sound Event Detection Using Frequency Dynamic Convolutions and BEATs Audio Embeddings. 3896-3907 - Ziye Yang
, Wenxing Yang
, Kai Xie
, Jie Chen
:
Integrating Data Priors to Weighted Prediction Error for Speech Dereverberation. 3908-3923 - Sei Ueno
, Akinobu Lee
, Tatsuya Kawahara
:
Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition. 3924-3933 - Minsu Kim
, Jeongsoo Choi
, Dahun Kim, Yong Man Ro
:
Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation. 3934-3946 - Magdalena Rybicka
, Jesús Villalba
, Thomas Thebaud
, Najim Dehak
, Konrad Kowalczyk
:
End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors. 3960-3973 - Bi-Cheng Yan
, Berlin Chen
:
An Effective Hierarchical Graph Attention Network Modeling Approach for Pronunciation Assessment. 3974-3985 - Alan Pawlak
, Hyunkook Lee
, Aki Mäkivirta
, Thomas Lund
:
Spatial Analysis and Synthesis Methods: Subjective and Objective Evaluations Using Various Microphone Arrays in the Auralization of a Critical Listening Room. 3986-4001 - Johannes W. de Vries
, Steven van de Par, Geert Leus
, Richard Heusdens
, Richard C. Hendriks
:
Binaural Beamforming Taking Into Account Spatial Release From Masking. 4002-4012 - Linfeng Feng
, Yijun Gong, Zhi Liu
, Xiao-Lei Zhang
, Xuelong Li
:
Learning Multi-Dimensional Speaker Localization: Axis Partitioning, Unbiased Label Distribution, and Data Augmentation. 4013-4025 - Tao Li
, Zhichao Wang
, Xinfa Zhu
, Jian Cong, Qiao Tian
, Yuping Wang, Lei Xie
:
U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning. 4026-4035 - Cheng Gong
, Xin Wang
, Erica Cooper
, Dan Wells
, Longbiao Wang
, Jianwu Dang
, Korin Richmond
, Junichi Yamagishi
:
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations. 4036-4051 - Thomas Deppisch
, Nils Meyer-Kahlen
, Sebastià V. Amengual Garí
:
Blind Identification of Binaural Room Impulse Responses From Smart Glasses. 4052-4065 - R. Chulaka Gunasekara
, Seokhwan Kim
, Luis Fernando D'Haro
, Abhinav Rastogi
, Yun-Nung Chen
, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür
, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang
, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng, Carla Gordon, Seyed Hossein Alavi, David R. Traum
, Maxine Eskénazi
, Ahmad Beirami, Eunjoon Cho, Paul A. Crook
, Ankita De, Alborz Geramifard, Satwik Kottur, Seungwhan Moon, Shivani Poddar, Rajen Subba:
Overview of the Ninth Dialog System Technology Challenge: DSTC9. 4066-4076 - ChuanPeng Guo
, Wei Yang
, Liusheng Huang
:
Steganalysis of AMR Speech Stream Based on Multi-Domain Information Fusion. 4077-4090 - Ui-Hyeop Shin
, Hyung-Min Park
:
Statistical Beamformer Exploiting Non-Stationarity and Sparsity With Spatially Constrained ICA for Robust Speech Recognition. 4091-4104 - Ruohong Huan
, Guowei Zhong
, Peng Chen
, Ronghua Liang
:
TriSAT: Trimodal Representation Learning for Multimodal Sentiment Analysis. 4105-4120 - Shensian Syu
, Juncheng Xie
, Hung-yi Lee
:
Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC. 4121-4133 - Vincent Lostanlen
, Aurora Cramer
, Justin Salamon
, Andrew Farnsworth
, Benjamin Van Doren, Steve Kelling, Juan Pablo Bello
:
BirdVoxDetect: Large-Scale Detection and Classification of Flight Calls for Bird Migration Monitoring. 4134-4145 - Mingyang Zhang
, Yi Zhou, Yi Ren, Chen Zhang
, Xiang Yin, Haizhou Li
:
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging. 4146-4156 - Yu-Fen Huang
, Nikki Moran
, Simon Coleman, Jon Kelly
, Shun-Hwa Wei, Po-Yin Chen
, Yun-Hsin Huang, Tsung-Ping Chen, Yu-Chia Kuo
, Yu-Chi Wei, Chih-Hsuan Li
, Da-Yu Huang
, Hsuan-Kai Kao
, Ting-Wei Lin
, Li Su
:
MOSA: Music Motion With Semantic Annotation Dataset for Cross-Modal Music Processing. 4157-4170 - Ying Mo
, Jiahao Liu, Hongyin Tang, Qifan Wang
, Zenglin Xu
, Jingang Wang, Xiaojun Quan
, Wei Wu
, Zhoujun Li
:
Multi-Task Multi-Attention Transformer for Generative Named Entity Recognition. 4171-4183 - Wupeng Wang
, Zexu Pan
, Xinke Li, Shuai Wang
, Haizhou Li
:
Speech Separation With Pretrained Frontend to Minimize Domain Mismatch. 4184-4198 - Zhaoci Liu
, Liping Chen
, Ya-Jun Hu, Zhen-Hua Ling
, Jia Pan
:
PE-Wav2vec: A Prosody-Enhanced Speech Model for Self-Supervised Prosody Learning in TTS. 4199-4210 - Bing Yang
, Xiaofei Li
:
Self-Supervised Learning of Spatial Acoustic Representation With Cross-Channel Signal Reconstruction and Multi-Channel Conformer. 4211-4225 - Vishal Kumar
, Vinayak Abrol
, Mathew Magimai-Doss
:
On the Quantization of Neural Models for Speaker Verification. 4226-4236 - Yadong Guan
, Jiqing Han
, Hongwei Song
, Shiwen Deng
, Guibin Zheng
, Tieran Zheng
, Yongjun He
:
Sound Activity-Aware Based Cross-Task Collaborative Training for Semi-Supervised Sound Event Detection. 3947-3959 - Miguel Ferrer
, Maria de Diego
, Alberto González
:
Filtered-X Quasi Affine Projection Algorithm for Active Noise Control Networks. 4237-4252 - Haolin Chen
, Philip N. Garner
:
Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting. 4253-4262 - Yang Li
, Cheng Yu
, Guangzhi Sun
, Weiqin Zu
, Zheng Tian
, Ying Wen
, Wei Pan
, Chao Zhang
, Jun Wang
, Yang Yang
, Fanglei Sun
:
Cross-Utterance Conditioned VAE for Speech Generation. 4263-4276 - Saurabhchand Bhati
, Jesús Villalba
, Piotr Zelasko
, Laureano Moro-Velázquez
, Najim Dehak
:
Slowness Regularized Contrastive Predictive Coding for Acoustic Unit Discovery. 4277-4287 - Ali Raza Syed
, Enis Berk Çoban, Dara Pir
, Michael I. Mandel
:
Data-Centric Methods for Environmental Sound Classification With Limited Labels. 4288-4297 - Tao Meng
, Fuchen Zhang
, Yuntao Shou
, Hongen Shao, Wei Ai
, Keqin Li
:
Masked Graph Learning With Recurrent Alignment for Multimodal Emotion Recognition in Conversation. 4298-4312 - Jinbo Hu
, Yin Cao
, Ming Wu, Qiuqiang Kong, Feiran Yang
, Mark D. Plumbley
, Jun Yang
:
Selective-Memory Meta-Learning With Environment Representations for Sound Event Localization and Detection. 4313-4327 - Liang Wan, Hongqing Liu
, Liming Shi
, Yi Zhou
, Lu Gan
:
Cross Domain Optimization for Speech Enhancement: Parallel or Cascade? 4328-4341 - Hüseyin Hacihabiboglu
:
Spherically Steerable Vector Differential Microphone Arrays. 4342-4354 - Si Ioi Ng
, Cymie Wing-Yee Ng
, Jiarui Wang, Tan Lee
:
Automatic Detection of Speech Sound Disorder in Cantonese-Speaking Pre-School Children. 4355-4368 - Juliano G. C. Ribeiro
, Shoichi Koyama
, Ryosuke Horiuchi, Hiroshi Saruwatari
:
Sound Field Estimation Based on Physics-Constrained Kernel Interpolation Adapted to Environment. 4369-4383 - Lu Li
, Maoshen Jia
, Changchun Bao
:
Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays. 4384-4398 - Nagarathna Ravi
, Thishyan Raj T
, Vipul Arora
:
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR. 4399-4408 - Chenwei Yan
, Xiangling Fu
, Xinxin You, Ji Wu
, Xien Liu
:
Graph-Based Cross-Granularity Message Passing on Knowledge-Intensive Text. 4409-4419 - Anurag Das
, Ricardo Gutierrez-Osuna
:
Improving Mispronunciation Detection Using Speech Reconstruction. 4420-4433 - Takahiro Fukumori
, Taito Ishida
, Yoichi Yamashita
:
RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction. 4434-4444 - Wenbin Jiang
, Kai Yu
, Fei Wen
:
Unsupervised Speech Enhancement Using Optimal Transport and Speech Presence Probability. 4445-4455 - Zexu Pan
, Marvin Borsdorf
, Siqi Cai
, Tanja Schultz
, Haizhou Li
:
NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals. 4456-4470 - Tomoro Tanaka
, Kohei Yatabe
, Yasuhiro Oikawa
:
PHAIN: Audio Inpainting via Phase-Aware Optimization With Instantaneous Frequency. 4471-4485 - Ilhan Aytutuldu
, Yakup Genc
, Yusuf Sinan Akgul
:
Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data. 4501-4510 - Ran Song, Xiang Huang
, Hao Peng
, Shengxiang Gao
, Zhengtao Yu
, Philip S. Yu
:
WDEA: The Structure and Semantic Fusion With Wasserstein Distance for Low-Resource Language Entity Alignment. 4511-4525 - Philippe Gonzalez
, Zheng-Hua Tan
, Jan Østergaard
, Jesper Jensen
, Tommy Sonne Alstrøm
, Tobias May
:
Investigating the Design Space of Diffusion Models for Speech Enhancement. 4486-4500 - Sagar Dutta
, Vipul Arora
:
AudioNet: Supervised Deep Hashing for Retrieval of Similar Audio Events. 4526-4536 - Weixin Meng
, Xiaoyu Li, Andong Li
, Xiaoxue Luo, Shefeng Yan
, Xiaodong Li
, Chengshi Zheng
:
Deep Kronecker Product Beamforming for Large-Scale Microphone Arrays. 4537-4553 - Shanzheng Guan
, Mou Wang
, Zhongxin Bai
, Jianyu Wang
, Jingdong Chen
, Jacob Benesty
:
Smoothed Frame-Level SINR and Its Estimation for Sensor Selection in Distributed Acoustic Sensor Networks. 4554-4568 - Yicheng Gu
, Xueyao Zhang, Liumeng Xue, Haizhou Li
, Zhizheng Wu
:
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoders. 4569-4579 - Jeong-Hwan Choi
, Joon-Young Yang
, Joon-Hyuk Chang
:
Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps. 4580-4595 - Nils L. Westhausen
, Hendrik Kayser
, Theresa Jansen
, Bernd T. Meyer
:
Real-Time Multichannel Deep Speech Enhancement in Hearing Aids: Comparing Monaural and Binaural Processing in Complex Acoustic Scenarios. 4596-4606 - Feihu Jin
, Yifan Liu
, Ying Tan
:
Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models. 4607-4616 - Zhuoran Li
, Chunming Hu
, Richong Zhang
, Junfan Chen
, Xiaohui Guo:
Zero-Shot Cross-Lingual Named Entity Recognition via Progressive Multi-Teacher Distillation. 4617-4630 - Linqin Wang
, Xiang Huang
, Zhengtao Yu
, Hao Peng
, Shengxiang Gao
, Cunli Mao
, Yuxin Huang
, Ling Dong, Philip S. Yu
:
Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation. 4631-4646 - Debang Liu
, Tianqi Zhang
, Mads Græsbøll Christensen
, Chen Yi
, Zeliang An
:
Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation. 4647-4660 - Pablo M. Delgado
, Jürgen Herre:
Towards Improved Objective Perceptual Audio Quality Assessment - Part 1: A Novel Data-Driven Cognitive Model. 4661-4675 - Yichen Yang
, Ningning Pan
, Wen Zhang
, Chao Pan
, Jacob Benesty
, Jingdong Chen
:
Interference-Controlled Maximum Noise Reduction Beamformer Based on Deep-Learned Interference Manifold. 4676-4690 - Munukutla L. N. Srinivas Karthik
, Joel S.
, Nithin V. George
:
FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System. 4691-4699 - Jinlong Xue
, Yayue Deng
, Yingming Gao
, Ya Li
:
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation. 4700-4712 - Hemant A. Patil
, Aastha Kachhi
, Ankur T. Patil
:
CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry. 4713-4726 - Amrit Romana
, Kazuhito Koishida
, Emily Mower Provost
:
Automatic Disfluency Detection From Untranscribed Speech. 4727-4740 - Pengfei Li, Guangyou Zhou
, Zhiwen Xie
, Penghui Xie, Jimmy Xiangji Huang
:
Learning Dynamic and Static Representations for Extrapolation-Based Temporal Knowledge Graph Reasoning. 4741-4754 - Shen Wang
, Jialiang Dong, Longfei Wu
, Zhitao Guan
:
WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment. 4755-4767 - Inmo Yeon
, Iljoo Jeong
, Seungchul Lee
, Jung-Woo Choi
:
EchoScan: Scanning Complex Room Geometries via Acoustic Echoes. 4768-4782 - Yuting Wei
, Linmei Hu
, Yangfu Zhu
, Jiaqi Zhao
, Bin Wu
:
Knowledge-Guided Transformer for Joint Theme and Emotion Classification of Chinese Classical Poetry. 4783-4794 - Jun-Yu Ma
, Jia-Chen Gu
, Zhen-Hua Ling
, Quan Liu, Cong Liu, Guoping Hu:
Syntax-Augmented Hierarchical Interactive Encoder for Zero-Shot Cross-Lingual Information Extraction. 4795-4809 - Zheng Liang
, Ziyang Ma, Chenpeng Du
, Kai Yu
, Xie Chen
:
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications. 4810-4821 - Chuxuan Tong
, Iynkaran Natgunanathan, Yong Xiang
, Jianhua Li
, Tianrui Zong, James Xi Zheng
, Longxiang Gao
:
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features. 4822-4837 - Ren Li
, Qiao Xiao
, Jianxi Yang
, Luyi Zhang
, Yu Chen
:
MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning. 4838-4849 - Dongheon Lee
, Jung-Woo Choi
:
DeFTAN-II: Efficient Multichannel Speech Enhancement With Subgroup Processing. 4850-4866 - Ge Zhu
, Jordan Darefsky, Zhiyao Duan
:
Cacophony: An Improved Contrastive Audio-Text Model. 4867-4879 - Zhanbiao Zhu
, Peijie Huang
, Haojing Huang
, Yuhong Xu
, Piyuan Lin, Leyi Lao, Shaoshen Chen
, Haojie Xie
, Shangjian Yin:
ELSF: Entity-Level Slot Filling Framework for Joint Multiple Intent Detection and Slot Filling. 4880-4893 - Wenxuan Wang
, Wenxiang Jiao
, Shuo Wang, Zhaopeng Tu
, Michael R. Lyu
:
Understanding and Mitigating the Uncertainty in Zero-Shot Translation. 4894-4904 - Bo Wang
, Yeling Tang
, Fei Wei
, Zhongjie Ba
, Kui Ren
:
FTDKD: Frequency-Time Domain Knowledge Distillation for Low-Quality Compressed Audio Deepfake Detection. 4905-4918 - Mun-Hak Lee
, Joon-Hyuk Chang
:
Proper Error Estimation and Calibration for Attention-Based Encoder-Decoder Models. 4919-4930 - Luca Della Libera
, Pooneh Mousavi
, Salah Zaiem
, Cem Subakan
, Mirco Ravanelli
:
CL-MASR: A Continual Learning Benchmark for Multilingual ASR. 4931-4944 - Hao Ma
, Zhiyuan Peng, Xu Li
, Mingjie Shao
, Xixin Wu
, Ju Liu
:
CLAPSep: Leveraging Contrastive Pre-Trained Model for Multi-Modal Query-Conditioned Target Sound Extraction. 4945-4960 - Anup Singh
, Kris Demuynck
, Vipul Arora
:
FlowHash: Accelerating Audio Search With Balanced Hashing via Normalizing Flow. 4961-4970 - Shuai Wang
, Zhengyang Chen, Kong Aik Lee
, Yanmin Qian
, Haizhou Li
:
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. 4971-4998 - Vahid Ahmadi Kalkhorani
, DeLiang Wang
:
TF-CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation. 4999-5009 - Efthymios Georgiou
, Yannis Avrithis
, Alexandros Potamianos
:
$\mathcal {P}$owMix: A Versatile Regularizer for Multimodal Sentiment Analysis. 5010-5023 - Thomas Dietzen
, Enzo De Sena
, Toon van Waterschoot
:
Scalable-Complexity Steered Response Power Based on Low-Rank and Sparse Interpolation. 5024-5039 - Spandan Dey
, Md. Sahidullah
, Goutam Saha
:
Towards Cross-Corpora Generalization for Low-Resource Spoken Language Identification. 5040-5050 - Yabo Wang
, Bing Yang
, Xiaofei Li
:
IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization. 5051-5064 - Sufeng Duan
, Hai Zhao
:
MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation. 5065-5077 - Weiqing Wang, Ming Li
:
Online Neural Speaker Diarization With Target Speaker Tracking. 5078-5091 - Eloi Moliner
, Filip Elvander
, Vesa Välimäki
:
Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach. 5092-5105 - Taegyun Kwon
, Dasaem Jeong
, Juhan Nam
:
Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models. 5106-5116 - Wei-Cheng Lin
, Kusha Sridhar, Carlos Busso
:
An Interpretable Deep Mutual Information Curriculum Metric for a Robust and Generalized Speech Emotion Recognition System. 5117-5130

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.