0% found this document useful (0 votes)
105 views9 pages

Speech Enhancement

Speech enhancement

Uploaded by

Siddharth Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views9 pages

Speech Enhancement

Speech enhancement

Uploaded by

Siddharth Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

J. Benesty S. Makino J.

Chen

Speech Enhancement
With 136 Figures and 18 Tables

^j Springer

2008 AGI-Information Management Consultants


May be used for personal purporses only or by
libraries associated to dandelon.com network.

Contents

1 Introduction
Jacob Benesty, Shoji Makino, Jingdong Chen
1.1 Speech Enhancement
1.2 Challenges and Opportunities
1.3 Organization of the Book
1.4 Further Reading
References
2 Study of the Wiener Filter for Noise Reduction
Jacob Benesty, Jingdong Chen, Yiteng (Arden) Huang, Simon Dodo
2.1 Introduction
2.2 Estimation of the Clean Speech Samples
2.3 Estimation of the Noise Samples
2.4 Important Relationships Between Noise Reduction and Speech Distortion
2.5 Particular Case: White Gaussian Noise
2.6 Better Ways to Manage Noise Reduction and Speech Distortion . . .
2.6.1 A Suboptimal Filter
2.6.2 Noise Reduction Exploiting the Speech Model
2.6.3 Noise Reduction with Multiple Microphones
2.7 Simulation Experiments
2.8 Conclusions
References
3 Statistical Methods for the Enhancement of
Noisy Speech
Rainer Martin
3.1 Introduction
3.2 Spectral Analysis
3.3 The Wiener Filter and its Implementation
3.4 Estimation of Spectral Amplitudes
3.4.1 MMSE Estimation
3.4.2 Maximum Likelihood and MAP Estimation
3.5 MMSE Estimation Using Super-Gaussian Speech Models
3.6 Background Noise Power Estimation
3.6.1 Minimum Statistics Noise Power Estimation
3.7 The MELPe Speech Coder
3.8 Conclusions
References

1
1
3
4
7
8
9
9
11
13
14
21
23
23
26
27
29
37
38
43
43
44
45
51
51
53
54
58
58
60
62
63

VIII

Contents

4 Single- and Multi-Microphone Spectral Amplitude


Estimation Using a Super-Gaussian Speech Model
Thomas hotter
4.1 Introduction
4.2 Single-Channel Statistical Filter
4.2.1 Statistical Model
4.2.2 Speech Estimators
4.3 Multichannel Statistical Filter
4.3.1 Joint Statistical Model
4.3.2 Multichannel MAP Spectral Amplitude Estimation
4.4 Experimental Results
4.5 Conclusions
References

67
67
68
70
78
83
84
86
88
93
93

5 From Volatility Modeling of Financial Time-Series


to Stochastic Modeling and Enhancement of Speech Signals .. 97
Israel Cohen
5.1 Introduction
97
5.2 Problem Formulation
99
5.3 Spectral Analysis
101
5.4 Statistical Model for Speech Signals
104
5.5 Model Estimation
105
5.6 Experimental Results
106
5.7 Conclusions
108
References
Ill
6 Single-Microphone Noise Suppression for 3G Handsets
Based on Weighted Noise Estimation
Akihiko Sugiyama, Masanori Kato, Masahiro Serizawa
6.1 Introduction
6.2 Conventional Noise Suppression Algorithm
6.2.1 MMSE-STSA
6.2.2 Problem in Noise Estimation
6.3 New Noise Suppression Algorithm
6.3.1 Weighted Noise Estimation
6.3.2 Spectral Gain Modification
6.3.3 Computational Requirements
6.4 Evaluation
6.4.1 Objective Evaluation for Noise Estimation
6.4.2 Subjective Evaluation
6.5 Conclusions
References

115
115
117
117
119
120
121
123
123
124
125
127
131
131

Contents
7

Signal Subspace Techniques for Speech Enhancement

Firas Jabloun, Benoit Champagne


7.1 Introduction
7.2 Signal and Noise Models
7.3 Linear Signal Estimation
7.3.1 Least-Squares Estimator
7.3.2 The Linear Minimum Mean Squared Error Estimator
7.3.3 The Time-Domain Constrained Estimator
7.3.4 The Spectral-Domain Constrained Estimator
7.4 Handling Colored Noise
7.4.1 Prewhitening
7.4.2 The Generalized Eigenvalue Decomposition Method
7.4.3 The Rayleigh Quotient Method
7.5 A Filterbank Interpretation
7.5.1 The Frequency to Eigendomain Transformation
7.5.2 The Eigen Filterbank
7.6 Implementation Issues
7.6.1 Estimating the Covariance Matrix
7.6.2 Parameter Analysis
7.7 Fast Subspace Estimation Techniques
7.7.1 Fast Eigenvalue Decomposition Methods
7.7.2 Subspace Tracking Methods
7.7.3 The Frame Based EVD (FBEVD) Method
7.8 Some Recent Developments
7.8.1 Auditory Masking
7.8.2 Multi-Microphone Systems
7.8.3 Subband Processing
7.9 Conclusions
References
8 Speech Enhancement: Application of the Kalman Filter
in the Estimate-Maximize (EM) Framework
Sharon Gannot
8.1 Introduction
8.2 Signal Model
8.3 EM - Based Algorithm
8.3.1 State Estimation (E-Step)
8.3.2 Parameter Estimation (M-Step)
8.3.3 Reduced Complexity
8.3.4 Discussion
8.4 Parameter Estimation Using Higher-Order Statistics
8.5 Gradient-Based Sequential Algorithm
8.6 All-Kalman Speech and Parameter Estimation
8.6.1 Dual Scheme
8.6.2 Joint Scheme

IX
135
135
137
138
139
139
140
141
143
143
144
145
146
146
146
148
149
150
152
153
153
154
155
155
156
156
157
157
161
161
166
168
169
170
171
171
172
174
175
176
178

Contents

8.7 Experimental Study


8.7.1 Experimental Setup
8.7.2 Verifying the Gaussian Assumption
8.7.3 Objective Evaluation
8.7.4 Subjective Evaluation
, 8.7.5 Comparison Between EM-Based Algorithms
8.7.6 Evaluation of the UKF
8.8 Conclusions
References
9 Speech Distortion Weighted Multichannel Wiener
Filtering Techniques for Noise Reduction
Simon Dodo, Ann Spriet, Jan Wouters, Marc Moonen
9.1 Introduction
9.2 GSC and Spatially Pre-Processed SDW-MWF
9.2.1 Notation and General Structure
9.2.2 Generalized Sidelobe Canceller
9.2.3 Speech Distortion Weighted Multichannel Wiener Filter
9.3 Frequency-Domain Criterion for SDW-MWF
9.3.1 Frequency-Domain Notation
9.3.2 Normal Equations
9.3.3 Adaptive Algorithm
9.3.4 Practical Implementation
9.4 Approximations for Reducing the Complexity
9.4.1 Block-Diagonal Correlation Matrices
9.4.2 Diagonal Correlation Matrices
9.4.3 Unconstrained Algorithms
9.4.4 Summary
9.5 Experimental Results
9.5.1 Setup and Performance Measures
9.5.2 SNR Improvement and Robustness Against Microphone Mismatch
9.5.3 Tracking Performance
9.6 Conclusions
References
10 Adaptive Microphone Arrays Employing Spatial Quadratic
Soft Constraints and Spectral Shaping
Sven Nordholm, Hai Quang Dam, Nedelko Grbic, Siow Yong Low
10.1 Introduction
10.2 Signal Modelling and Problem Formulation
10.2.1 Analysis and Synthesis Filterbanks
10.2.2 The Wiener Solution
10.2.3 The Space Constrained Source Covariance Information
10.3 Robust Soft Constrained Adaptive Microphone Array (RSCAMA)

181
181
182
183
186
188
188
189
195
199
199
201
201
204
205
207
207
208
210
212
213
213
216
217
218
219
219
220
223
224
225
229
229
231
232
233
234
235

Contents

XI

10.3.1 Problem Formulation


10.3.2 A Recursive Algorithm for the RSCAMA
10.4 Noise Statistics Updated Adaptive Microphone Array (NSUAMA)
10.4.1 Problem Formulation
10.4.2 The Noise Covariance Detector
10.4.3 Estimation of Power Spectrum of SOI
10.4.4 The NSUAMA Algorithm
i
10.5 Evaluations
10.5.1 The Simulation Scenario
10.5.2 Results for RSCAMA and NSUAMA Beamformers
10.6 Conclusions
References

235
237
238
238
238
240
241
242
242
242
245
245

11 Single-Microphone Blind Dereverberation


Tomohiro Nakatani, Masato Miyoshi, Keisuke Kinoshita
11.1 Introduction
11.2 Overview of Existing Approaches
11.2.1 Blind Inverse Filtering
11.2.2 Dereverberation Based on Speech Signal Features
11.3 Harmonicity of Speech Signals and Its Robust Estimation
11.3.1 Model of Speech Harmonicity
11.3.2 Adaptive Harmonic Filtering
11.3.3 Robust Fo Estimation and Voicing Detection
11.4 Harmonicity Based Dereverberation - HERB
11.4.1 Basic Idea
11.4.2 Model of Reverberant Speech Signal
11.4.3 Dereverberation Filter
11.4.4 Interpretation of the Dereverberation Filter
-.
11.5 Implementation of a Prototype System
11.5.1 Dereverberation Filter Calculation
11.5.2 Heuristics Improving Accuracy of Fo Estimation and Voicing
Decisions with Reverberation
11.6 Simulation Experiments
11.6.1 Task: Dereverberation of Word Utterances
11.6.2 Energy Decay Curves of Impulse Responses
11.6.3 Speaker Dependent Word Recognition Rate
11.7 Future Directions
11.7.1 Theoretical Extension of HERB
11.7.2 Accuracy Improvement of Speech Model
11.7.3 Reduction of Training Data Size
11.8 Conclusions
References

247
247
249
249
250
251
251
252
253
255
255
256
258
. 258
260
261
261
262
262
262
263
265
266
266
268
268
269

XII

Contents

12 Separation and Dereverberation of Speech Signals


with Multiple Microphones
Yiteng (Arden) Huang, Jacob Benesty, Jingdong Chen
12.1 Introduction
12.2 Signal Model and Problem Formulation
12.3 Blind Identification of a SIMO System
12.4 Separating Reverberant Speech and Concurrent Interference
12.4.1 Example: Removing Interference Signals in a 2 x 3 MIMO
Acoustic System
12.4.2 Generalization
12.5 Speech Dereverberation
12.5.1 Principle
12.5.2 The Least-Squares Implementation
12.6 Simulations
12.6.1 Performance Measures
12.6.2 Experimental Setup
12.6.3 Experimental Results
12.7 Conclusions
References
13 Frequency-Domain Blind Source Separation
Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino
13.1 Introduction
13.2 BSS for Convolutive Mixtures
13.3 Overview of Frequency-Domain Approach . .
13.4 Complex-Valued ICA
13.5 Source Localization
13.5.1 Basic Theory for Nearfield Model
13.5.2 DOA Estimation with Farfield Model
13.6 Permutation Alignment
13.6.1 Localization Approach
13.6.2 Correlation Approach
13.6.3 Integrated Method
13.7 Scaling Alignment
13.8 Spectral Smoothing ..'.
13.8.1 Windowing
13.8.2 Minimizing Error by Adjusting Scaling Ambiguity
13.9 Experimental Results
13.9.1 Linear Array
13.9.2 Planar Array
13.10 Conclusions
References

271
271
274
276
279
279
281
284
284
286
287
287
288
290
296
297
299
299
301
302
304
306
307
308
311
312
312
314
315
317
318
319
320
320
322
324
324

Contents

XIII

14 Subband Based Blind Source Separation


329
Shoko Araki, Shoji Makino
14.1 Introduction
329
14.2 BSS of Convolutive Mixtures
331
14.2.1 Model Description
331
14.2.2 Frequency-Domain BSS and Related Issue
332
14.3 Subband Based BSS
333
14.3.1 Configuration of Subband BSS
333
14.3.2 Time-Domain BSS Implementation for a Separation Stage. . . 336
14.3.3 Solving the Permutation and Scaling Problems
337
14.4 Basic Experiments for Subband BSS
339
14.4.1 Experimental Setup
339
14.4.2 Subband System
340
14.4.3 Conventional Frequency-Domain BSS
340
14.4.4 Conventional Fullband Time-Domain BSS
341
14.4.5 Results
341
14.4.6 Discussion
343
14.5 Frequency-Appropriate Processing for Further Improvement
344
14.5.1 Longer Separation Filters in Low Frequency Bands
345
14.5.2 Overlap-Blockshift in Low Frequency Bands
346
14.5.3 Discussion
347
14.6 Conclusions
349
References
350
15 Real-Time Blind Source Separation for Moving
Speech Signals
353
Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino
15.1 Introduction
353
15.2 ICA Based BSS of Convolutive Mixtures
355
15.2.1 Frequency-Domain ICA
355
15.2.2 Permutation and Scaling Problems
356
15.2.3 Low Delay Blockwise Batch Algorithm
357
15.3 Residual Crosstalk Cancellation
358
15.3.1 Straight and Crosstalk Components of BSS
358
15.3.2 Model of Residual Crosstalk Component Estimation
359
15.3.3 Adaptive Algorithm and Spectrum Estimation
360
15.4 Experiments and Discussions
362
15.4.1 Experimental Conditions
362
15.4.2 Performance for Fixed Sources
364
15.4.3 Moving Target and Moving Interference
365
15.4.4 Performance of Blockwise Batch Algorithm with Postprocessing366
15.4.5 Performance of Online Algorithm
367
15.5 Conclusions
367
References
368

XIV

Contents

16 Separation of Speech by Computational Auditory


Scene Analysis

371

Guy J. Brown, DeLiang Wang


16.1 Introduction
16.2 Auditory Scene Analysis
16.3 Computational Auditory Scene Analysis
16.3.1 Peripheral Auditory Processing and Feature Extraction
16.3.2 Monaural Approaches
16.3.3 Binaural Approaches
16.3.4 Frameworks for Cue Integration
16.4 Integrating CASA with Speech Recognition
16.5 CASA Compared to ICA
16.6 Challenges for CASA
16.7 Conclusions
References

371
372
373
375
376
382
387
391
394
395
398
398

Index

403

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy