Techniques for Noise Robustness in Automatic Speech Recognition
Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and a...
Uloženo v:
| Hlavní autor: | |
|---|---|
| Médium: | E-kniha |
| Jazyk: | angličtina |
| Vydáno: |
Newark
Wiley
2012
John Wiley & Sons, Incorporated Wiley-Blackwell |
| Vydání: | 1 |
| Témata: | |
| ISBN: | 9781119970880, 1119970881, 9781118392669, 1118392663 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems.  As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences. Key features: Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech. Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments. Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR. Includes contributions from top ASR researchers from leading research units in the field. |
|---|---|
| AbstractList | Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems.  As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences. Key features: Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech. Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments. Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR. Includes contributions from top ASR researchers from leading research units in the field. Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences. Key features: * Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech. * Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments. * Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR. * Includes contributions from top ASR researchers from leading research units in the field Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences. Key features: Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech. Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments. Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR. Includes contributions from top ASR researchers from leading research units in the field Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.Key features:Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.Includes contributions from top ASR researchers from leading research units in the field With the growing use of automatic speech recognition (ASR) in everyday life, the ability to solve problems in recorded speech is critical for engineers and researchers developing ASR technologies. This book presents a comprehensive survey of state-of-the-art techniques used to improve the robustness of ASR systems. |
| Author | Tuomas Virtanen, Rita Singh, Bhiksha Raj, Tuomas Virtanen, Rita Singh, Bhiksha Raj |
| Author_xml | – sequence: 1 fullname: Tuomas Virtanen, Rita Singh, Bhiksha Raj, Tuomas Virtanen, Rita Singh, Bhiksha Raj |
| BookMark | eNqNkE1Lw0AQhlf8wFp79CB4yEEQD9Wd_cjunqSW-gFFoRavYZNO2tiYrdm0_n1TE0Hw4lyGl3l4YN4jsle4Agk5AXoFlLJrozQAaG5YqPkO6f3Kodn9ycYoqjU9IB0NkhvKNRySnvdvtB4FhkrRITdTTBZF9rFGH6SuDJ5c5jGYuHjtqwK9D7IiGKwr926rLAleVljjwQQTNy-yKnPFMdlPbe6x1-4ueb0bTYcP_fHz_eNwMO7bkIKQ_VTRWCUKqa2jZRJnTCSgRJwasDGNASUkEmbaAMjYGMtZyjRKoQVYqznvkstGbP0SP_3C5ZWPNjnGzi199KsBFf6fDU3Nnjest6kts6hhNuwPdtFgq9Jty6qib1uCRVXaPBrdDoFyJkDV5FlLYpnj3LVGqP9mYis6bc4ZIrY3TcW2JP4FkcGIqw |
| ContentType | eBook |
| Copyright | 2012 Wiley |
| Copyright_xml | – notice: 2012 Wiley |
| DBID | YSPEL OHILO OODEK |
| DEWEY | 006.454 |
| DOI | 10.1002/9781118392683 |
| DatabaseName | Perlego O'Reilly Online Learning: Corporate Edition O'Reilly Online Learning: Academic/Public Library Edition |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISBN | 9781118392669 9781118392683 111839268X 1118392671 9781118392676 1118392663 9781119970880 1119970881 |
| Edition | 1 |
| Editor | Raj, Bhiksha Singh, Rita Virtanen, Tuomas |
| Editor_xml | – sequence: 1 givenname: Tuomas surname: Virtanen fullname: Virtanen, Tuomas organization: Tampere University of Technology – sequence: 2 givenname: Rita surname: Singh fullname: Singh, Rita organization: Carnegie Mellon University – sequence: 3 givenname: Bhiksha surname: Raj fullname: Raj, Bhiksha organization: Carnegie Mellon University |
| ExternalDocumentID | 9781118392676 9781118392669 EBC1032417 1014249 8040145 |
| Genre | Electronic books |
| GroupedDBID | 089 20A 38. 3XM 5VX 6IK 6XM A4J AABBV AARDG ABARN ABBFG ABIAV ABQPQ ABQPW ACGYG ACIQC ACLGV ACNUM ADOPM ADVEM AERYV AFLZI AFOJC AJFER AKHYG AKQZE ALMA_UNASSIGNED_HOLDINGS ALTAS AMYDA ASVIU AZZ BBABE BEFXN BFFAM BFMIH BGNUA BKEBE BPEOZ CZZ DRU ECNEQ GEOUK HF4 IETMW IEZ IPJKO IVK IVR JFSCD JJU KJBCJ LQKAK LWYJN MYL OCL OHILO OODEK PLCCB PQEST PQQKQ PQUKI P~0 T71 UZ6 W1A WIIVT YPLAZ YSPEL ZEEST ~H6 ACBYE ACCPI AHWGJ OHSWP ABAZT |
| ID | FETCH-LOGICAL-a60145-f70b7c7e0a014a25ed24c174bf91ab0b1e51c51d89115b99a32f28e54841aa833 |
| ISBN | 9781119970880 1119970881 9781118392669 1118392663 |
| IngestDate | Mon Apr 21 07:46:00 EDT 2025 Sun Jun 22 08:19:41 EDT 2025 Fri Dec 05 20:24:53 EST 2025 Wed Dec 10 09:08:32 EST 2025 Tue Dec 02 17:21:58 EST 2025 Mon Aug 28 20:39:28 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| LCCallNum_Ident | TK7882.S65.V57 2012 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-a60145-f70b7c7e0a014a25ed24c174bf91ab0b1e51c51d89115b99a32f28e54841aa833 |
| OCLC | 815390381 |
| PQID | EBC1032417 |
| PageCount | 520 |
| ParticipantIDs | askewsholts_vlebooks_9781118392676 askewsholts_vlebooks_9781118392669 safari_books_v2_9781118392669 proquest_ebookcentral_EBC1032417 perlego_books_1014249 ieee_books_8040145 |
| PublicationCentury | 2000 |
| PublicationDate | 2012 2012-11-28T00:00:00 2012-09-19 |
| PublicationDateYYYYMMDD | 2012-01-01 2012-11-28 2012-09-19 |
| PublicationDate_xml | – year: 2012 text: 2012 |
| PublicationDecade | 2010 |
| PublicationPlace | Newark |
| PublicationPlace_xml | – name: Newark |
| PublicationYear | 2012 |
| Publisher | Wiley John Wiley & Sons, Incorporated Wiley-Blackwell |
| Publisher_xml | – name: Wiley – name: John Wiley & Sons, Incorporated – name: Wiley-Blackwell |
| SSID | ssj0000719054 |
| Score | 2.3215356 |
| Snippet | Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are... With the growing use of automatic speech recognition (ASR) in everyday life, the ability to solve problems in recorded speech is critical for engineers and... |
| SourceID | askewsholts safari proquest perlego ieee |
| SourceType | Aggregation Database Publisher |
| SubjectTerms | Automatic speech recognition Communication, Networking and Broadcast Technologies Components, Circuits, Devices and Systems Computing and Processing Signal Processing and Analysis TECHNOLOGY & ENGINEERING |
| TableOfContents | List of Contributors xv Acknowledgments xvii 1 Introduction 1
Tuomas Virtanen, Rita Singh, Bhiksha Raj 1.1 Scope of the Book 1 1.2 Outline 2 1.3 Notation 4 Part One FOUNDATIONS 2 The Basics of Automatic Speech Recognition 9
Rita Singh, Bhiksha Raj, Tuomas Virtanen 2.1 Introduction 9 2.2 Speech Recognition Viewed as Bayes Classification 10 2.3 Hidden Markov Models 11 2.3.1 Computing Probabilities with HMMs 12 2.3.2 Determining the State Sequence 17 2.3.3 Learning HMM Parameters 19 2.3.4 Additional Issues Relating to Speech Recognition Systems 20 2.4 HMM-Based Speech Recognition 24 2.4.1 Representing the Signal 24 2.4.2 The HMM for a Word Sequence 25 2.4.3 Searching through all Word Sequences 26 References 29 3 The Problem of Robustness in Automatic Speech Recognition 31
Bhiksha Raj, Tuomas Virtanen, Rita Singh 3.1 Errors in Bayes Classification 31 3.1.1 Type 1 Condition: Mismatch Error 33 3.1.2 Type 2 Condition: Increased Bayes Error 34 3.2 Bayes Classification and ASR 35 3.2.1 All We Have is a Model: A Type 1 Condition 35 3.2.2 Intrinsic Interferences—Signal Components that are Unrelated to the Message: A Type 2 Condition 36 3.2.3 External Interferences—The Data are Noisy: Type 1 and Type 2 Conditions 36 3.3 External Influences on Speech Recordings 36 3.3.1 Signal Capture 37 3.3.2 Additive Corruptions 41 3.3.3 Reverberation 42 3.3.4 A Simplified Model of Signal Capture 43 3.4 The Effect of External Influences on Recognition 44 3.5 Improving Recognition under Adverse Conditions 46 3.5.1 Handling the Model Mismatch Error 46 3.5.2 Dealing with Intrinsic Variations in the Data 47 3.5.3 Dealing with Extrinsic Variations 47 References 50 Part Two SIGNAL ENHANCEMENT 4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement 53
Rainer Martin, Dorothea Kolossa 4.1 Introduction 53 4.2 Signal Analysis and Synthesis 55 4.2.1 DFT-Based Analysis Synthesis with Perfect Reconstruction 55 4.2.2 Probability Distributions for Speech and Noise DFT Coefficients 57 4.3 Voice Activity Detection 58 4.3.1 VAD Design Principles 58 4.3.2 Evaluation of VAD Performance 62 4.3.3 Evaluation in the Context of ASR 62 4.4 Noise Power Spectrum Estimation 65 4.4.1 Smoothing Techniques 65 4.4.2 Histogram and GMM Noise Estimation Methods 67 4.4.3 Minimum Statistics Noise Power Estimation 67 4.4.4 MMSE Noise Power Estimation 68 4.4.5 Estimation of the A Priori Signal-to-Noise Ratio 69 4.5 Adaptive Filters for Signal Enhancement 71 4.5.1 Spectral Subtraction 71 4.5.2 Nonlinear Spectral Subtraction 73 4.5.3 Wiener Filtering 74 4.5.4 The ETSI Advanced Front End 75 4.5.5 Nonlinear MMSE Estimators 75 4.6 ASR Performance 80 4.7 Conclusions 81 References 82 5 Extraction of Speech from Mixture Signals 87
Paris Smaragdis 5.1 The Problem with Mixtures 87 5.2 Multichannel Mixtures 88 5.2.1 Basic Problem Formulation 88 5.2.2 Convolutive Mixtures 92 5.3 Single-Channel Mixtures 98 5.3.1 Problem Formulation 98 5.3.2 Learning Sound Models 100 5.3.3 Separation by Spectrogram Factorization 101 5.3.4 Dealing with Unknown Sounds 105 5.4 Variations and Extensions 107 5.5 Conclusions 107 References 107 6 Microphone Arrays 109
John McDonough, Kenichi Kumatani 6.1 Speaker Tracking 110 6.2 Conventional Microphone Arrays 113 6.3 Conventional Adaptive Beamforming Algorithms 120 6.3.1 Minimum Variance Distortionless Response Beamformer 120 6.3.2 Noise Field Models 122 6.3.3 Subband Analysis and Synthesis 123 6.3.4 Beamforming Performance Criteria 126 6.3.5 Generalized Sidelobe Canceller Implementation 129 6.3.6 Recursive Implementation of the GSC 130 6.3.7 Other Conventional GSC Beamformers 131 6.3.8 Beamforming based on Higher Order Statistics 132 6.3.9 Online Implementation 136 6.3.10 Speech-Recognition Experiments 140 6.4 Spherical Microphone Arrays 142 6.5 Spherical Adaptive Algorithms 148 6.6 Comparative Studies 149 6.7 Comparison of Linear and Spherical Arrays for DSR 152 6.8 Conclusions and Further Reading 154 References 155 Part Three FEATURE ENHANCEMENT 7 From Signals to Speech Features by Digital Signal Processing 161
Matthias W¨olfel 7.1 Introduction 161 7.1.1 About this Chapter 162 7.2 The Speech Signal 162 7.3 Spectral Processing 163 7.3.1 Windowing 163 7.3.2 Power Spectrum 165 7.3.3 Spectral Envelopes 166 7.3.4 LP Envelope 166 7.3.5 MVDR Envelope 169 7.3.6 Warping the Frequency Axis 171 7.3.7 Warped LP Envelope 175 7.3.8 Warped MVDR Envelope 176 7.3.9 Comparison of Spectral Estimates 177 7.3.10 The Spectrogram 179 7.4 Cepstral Processing 179 7.4.1 Definition and Calculation of Cepstral Coefficients 180 7.4.2 Characteristics of Cepstral Sequences 181 7.5 Influence of Distortions on Different Speech Features 182 7.5.1 Objective Functions 182 7.5.2 Robustness against Noise 185 7.5.3 Robustness against Echo and Reverberation 187 7.5.4 Robustness against Changes in Fundamental Frequency 189 7.6 Summary and Further Reading 191 References 191 8 Features Based on Auditory Physiology and Perception 193
Richard M. Stern, Nelson Morgan 8.1 Introduction 193 8.2 Some Attributes of Auditory Physiology and Perception 194 8.2.1 Peripheral Processing 194 8.2.2 Processing at more Central Levels 200 8.2.3 Psychoacoustical Correlates of Physiological Observations 202 8.2.4 The Impact of Auditory Processing on Conventional Feature Extraction 206 8.2.5 Summary 208 8.3 “Classic” Auditory Representations 208 8.4 Current Trends in Auditory Feature Analysis 213 8.5 Summary 221 Acknowledgments 222 References 222 9 Feature Compensation 229
Jasha Droppo 9.1 Life in an Ideal World 229 9.1.1 Noise Robustness Tasks 229 9.1.2 Probabilistic Feature Enhancement 230 9.1.3 Gaussian Mixture Models 231 9.2 MMSE-SPLICE 232 9.2.1 Parameter Estimation 233 9.2.2 Results 236 9.3 Discriminative SPLICE 237 9.3.1 The MMI Objective Function 238 9.3.2 Training the Front-End Parameters 239 9.3.3 The Rprop Algorithm 240 9.3.4 Results 241 9.4 Model-Based Feature Enhancement 242 9.4.1 The Additive Noise-Mixing Equation 243 9.4.2 The Joint Probability Model 244 9.4.3 Vector Taylor Series Approximation 246 9.4.4 Estimating Clean Speech 247 9.4.5 Results 247 9.5 Switching Linear Dynamic System 248 9.6 Conclusion 249 References 249 10 Reverberant Speech Recognition 251
Reinhold Haeb-Umbach, Alexander Krueger 10.1 Introduction 251 10.2 The Effect of Reverberation 252 10.2.1 What is Reverberation? 252 10.2.2 The Relationship between Clean and Reverberant Speech Features 254 10.2.3 The Effect of Reverberation on ASR Performance 258 10.3 Approaches to Reverberant Speech Recognition 258 10.3.1 Signal-Based Techniques 259 10.3.2 Front-End Techniques 260 10.3.3 Back-End Techniques 262 10.3.4 Concluding Remarks 265 10.4 Feature Domain Model of the Acoustic Impulse Response 265 10.5 Bayesian Feature Enhancement 267 10.5.1 Basic Approach 268 10.5.2 Measurement Update 269 10.5.3 Time Update 270 10.5.4 Inference 271 10.6 Experimental Results 272 10.6.1 Databases 272 10.6.2 Overview of the Tested Methods 273 10.6.3 Recognition Results on Reverberant Speech 274 10.6.4 Recognition Results on Noisy Reverberant Speech 276 10.7 Conclusions 277 Acknowledgment 278 References 278 Part Four MODEL ENHANCEMENT 11 Adaptation and Discriminative Training of Acoustic Models 285
Yannick Est`eve, Paul Del´eglise 11.1 Introduction 285 11.1.1 Acoustic Models 286 11.1.2 Maximum Likelihood Estimation 287 11.2 Acoustic Model Adaptation and Noise Robustness 288 11.2.1 Static (or Offline) Adaptation 289 11.2.2 Dynamic (or Online) Adaptation 289 11.3 Maximum A Posteriori Reestimation 290 11.4 Maximum Likelihood Linear Regression 293 11.4.1 Class Regression Tree 294 11.4.2 Constrained Maximum Likelihood Linear Regression 297 11.4.3 CMLLR Implementation 297 11.4.4 Speaker Adaptive Training 298 11.5 Discriminative Training 299 11.5.1 MMI Discriminative Training Criterion 301 11.5.2 MPE Discriminative Training Criterion 302 11.5.3 I-smoothing 303 11.5.4 MPE Implementation 304 11.6 Conclusion 307 References 308 12 Factorial Models for Noise Robust Speech Recognition 311
John R. Hershey, Steven J. Rennie, Jonathan Le Roux 12.1 Introduction 311 12.2 The Model-Based Approach 313 12.3 Signal Feature Domains 314 12.4 Interaction Models 317 12.4.1 Exact Interaction Model 318 12.4.2 Max Model 320 12.4.3 Log-Sum Model 321 12.4.4 Mel Interaction Model 321 12.5 Inference Methods 322 12.5.1 Max Model Inference 322 12.5.2 Parallel Model Combination 324 12.5.3 Vector Taylor Series Approaches 326 12.5.4 SNR-Dependent Approaches 331 12.6 Efficient Likelihood Evaluation in Factorial Models 332 12.6.1 Efficient Inference using the Max Model 332 12.6.2 Efficient Vector-Taylor Series Approaches 334 12.6.3 Band Quantization 335 12.7 Current Directions 337 12.7.1 Dynamic Noise Models for Robust ASR 338 12.7.2 Multi-Talker Speech Recognition using Graphical Models 339 12.7.3 Noise Robust ASR using Non-Negative Basis Representations 340 References 341 13 Acoustic Model Training for Robust Speech Recognition 347
Michael L. Seltzer 13.1 Introduction 347 13.2 Traditional Training Methods for Robust Speech Recognition 348 13.3 A Brief Overview of Speaker Adaptive Training 349 13.4 Feature-Space Noise Adaptive Training 351 13.4.1 Experiments using fNAT 352 13.5 Model-Space Noise Adaptive Training 353 13.6 Noise Adaptive Training using VTS Adaptation 355 13.6.1 Vector Taylor Series HMM Adaptation 355 13.6.2 Updating the Acoustic Model Parameters 357 13.6.3 Updating the Environmental Parameters 360 13.6.4 Implementation Details 360 13.6.5 Experiments using NAT 361 13.7 Discussion 364 13.7.1 Comparison of Training Algorithms 364 13.7.2 Comparison to Speaker Adaptive Training 364 13.7.3 Related Adaptive Training Methods 365 13.8 Conclusion 366 References 366 Part Five COMPENSATION FOR INFORMATION LOSS 14 Missing-Data Techniques: Recognition with Incomplete Spectrograms 371
Jon Barker 14.1 Introduction 371 14.2 Classification with Incomplete Data 373 14.2.1 A Simple Missing Data Scenario 374 14.2.2 Missing Data Theory 14 Missing-Data Techniques: Recognition with Incomplete Spectrograms 9.3.2 Training the Front-End Parameters -- 9.3.3 The Rprop Algorithm -- 9.3.4 Results -- 9.4 Model-Based Feature Enhancement -- 9.4.1 The Additive Noise-Mixing Equation -- 9.4.2 The Joint Probability Model -- 9.4.3 Vector Taylor Series Approximation -- 9.4.4 Estimating Clean Speech -- 9.4.5 Results -- 9.5 Switching Linear Dynamic System -- 9.6 Conclusion -- References -- 10 Reverberant Speech Recognition -- 10.1 Introduction -- 10.2 The Effect of Reverberation -- 10.2.1 What is Reverberation? -- 10.2.2 The Relationship between Clean and Reverberant Speech Features -- 10.2.3 The Effect of Reverberation on ASR Performance -- 10.3 Approaches to Reverberant Speech Recognition -- 10.3.1 Signal-Based Techniques -- 10.3.2 Front-End Techniques -- 10.3.3 Back-End Techniques -- 10.3.4 Concluding Remarks -- 10.4 Feature Domain Model of the Acoustic Impulse Response -- 10.5 Bayesian Feature Enhancement -- 10.5.1 Basic Approach -- 10.5.2 Measurement Update -- 10.5.3 Time Update -- 10.5.4 Inference -- 10.6 Experimental Results -- 10.6.1 Databases -- 10.6.2 Overview of the Tested Methods -- 10.6.3 Recognition Results on Reverberant Speech -- 10.6.4 Recognition Results on Noisy Reverberant Speech -- 10.7 Conclusions -- Acknowledgment -- References -- Part Four MODEL ENHANCEMENT -- 11 Adaptation and Discriminative Training of Acoustic Models -- 11.1 Introduction -- 11.1.1 Acoustic Models -- 11.1.2 Maximum Likelihood Estimation -- 11.2 Acoustic Model Adaptation and Noise Robustness -- 11.2.1 Static (or Offline) Adaptation -- 11.2.2 Dynamic (or Online) Adaptation -- 11.3 Maximum A Posteriori Reestimation -- 11.4 Maximum Likelihood Linear Regression -- 11.4.1 Class Regression Tree -- 11.4.2 Constrained Maximum Likelihood Linear Regression -- 11.4.3 CMLLR Implementation -- 11.4.4 Speaker Adaptive Training -- 11.5 Discriminative Training 4.2.1 DFT-Based Analysis Synthesis with Perfect Reconstruction -- 4.2.2 Probability Distributions for Speech and Noise DFT Coefficients -- 4.3 Voice Activity Detection -- 4.3.1 VAD Design Principles -- 4.3.2 Evaluation of VAD Performance -- 4.3.3 Evaluation in the Context of ASR -- 4.4 Noise Power Spectrum Estimation -- 4.4.1 Smoothing Techniques -- 4.4.2 Histogram and GMM Noise Estimation Methods -- 4.4.3 Minimum Statistics Noise Power Estimation -- 4.4.4 MMSE Noise Power Estimation -- 4.4.5 Estimation of the A Priori Signal-to-Noise Ratio -- 4.5 Adaptive Filters for Signal Enhancement -- 4.5.1 Spectral Subtraction -- 4.5.2 Nonlinear Spectral Subtraction -- 4.5.3 Wiener Filtering -- 4.5.4 The ETSI Advanced Front End -- 4.5.5 Nonlinear MMSE Estimators -- 4.6 ASR Performance -- 4.7 Conclusions -- References -- 5 Extraction of Speech from Mixture Signals -- 5.1 The Problem with Mixtures -- 5.2 Multichannel Mixtures -- 5.2.1 Basic Problem Formulation -- 5.2.2 Convolutive Mixtures -- 5.3 Single-Channel Mixtures -- 5.3.1 Problem Formulation -- 5.3.2 Learning Sound Models -- 5.3.3 Separation by Spectrogram Factorization -- 5.3.4 Dealing with Unknown Sounds -- 5.4 Variations and Extensions -- 5.5 Conclusions -- References -- 6 Microphone Arrays -- 6.1 Speaker Tracking -- 6.2 Conventional Microphone Arrays -- 6.3 Conventional Adaptive Beamforming Algorithms -- 6.3.1 Minimum Variance Distortionless Response Beamformer -- 6.3.2 Noise Field Models -- 6.3.3 Subband Analysis and Synthesis -- 6.3.4 Beamforming Performance Criteria -- 6.3.5 Generalized Sidelobe Canceller Implementation -- 6.3.6 Recursive Implementation of the GSC -- 6.3.7 Other Conventional GSC Beamformers -- 6.3.8 Beamforming based on Higher Order Statistics -- 6.3.9 Online Implementation -- 6.3.10 Speech-Recognition Experiments -- 6.4 Spherical Microphone Arrays 6.5 Spherical Adaptive Algorithms -- 6.6 Comparative Studies -- 6.7 Comparison of Linear and Spherical Arrays for DSR -- 6.8 Conclusions and Further Reading -- References -- Part Three FEATURE ENHANCEMENT -- 7 From Signals to Speech Features by Digital Signal Processing -- 7.1 Introduction -- 7.1.1 About this Chapter -- 7.2 The Speech Signal -- 7.3 Spectral Processing -- 7.3.1 Windowing -- 7.3.2 Power Spectrum -- 7.3.3 Spectral Envelopes -- 7.3.4 LP Envelope -- 7.3.5 MVDR Envelope -- 7.3.6 Warping the Frequency Axis -- 7.3.7 Warped LP Envelope -- 7.3.8 Warped MVDR Envelope -- 7.3.9 Comparison of Spectral Estimates -- 7.3.10 The Spectrogram -- 7.4 Cepstral Processing -- 7.4.1 Definition and Calculation of Cepstral Coefficients -- 7.4.2 Characteristics of Cepstral Sequences -- 7.5 Influence of Distortions on Different Speech Features -- 7.5.1 Objective Functions -- 7.5.2 Robustness against Noise -- 7.5.3 Robustness against Echo and Reverberation -- 7.5.4 Robustness against Changes in Fundamental Frequency -- 7.6 Summary and Further Reading -- References -- 8 Features Based on Auditory Physiology and Perception -- 8.1 Introduction -- 8.2 Some Attributes of Auditory Physiology and Perception -- 8.2.1 Peripheral Processing -- 8.2.2 Processing at more Central Levels -- 8.2.3 Psychoacoustical Correlates of Physiological Observations -- 8.2.4 The Impact of Auditory Processing on Conventional Feature Extraction -- 8.2.5 Summary -- 8.3 "Classic" Auditory Representations -- 8.4 Current Trends in Auditory Feature Analysis -- 8.5 Summary -- Acknowledgments -- References -- 9 Feature Compensation -- 9.1 Life in an Ideal World -- 9.1.1 Noise Robustness Tasks -- 9.1.2 Probabilistic Feature Enhancement -- 9.1.3 Gaussian Mixture Models -- 9.2 MMSE-SPLICE -- 9.2.1 Parameter Estimation -- 9.2.2 Results -- 9.3 Discriminative SPLICE -- 9.3.1 The MMI Objective Function Intro -- TECHNIQUES FOR NOISE ROBUSTNESS IN AUTOMATIC SPEECH RECOGNITION -- Contents -- List of Contributors -- Acknowledgments -- 1 Introduction -- 1.1 Scope of the Book -- 1.2 Outline -- 1.3 Notation -- Part One FOUNDATIONS -- 2 The Basics of Automatic Speech Recognition -- 2.1 Introduction -- 2.2 Speech Recognition Viewed as Bayes Classification -- 2.3 Hidden Markov Models -- 2.3.1 Computing Probabilities with HMMs -- 2.3.2 Determining the State Sequence -- 2.3.3 Learning HMM Parameters -- 2.3.4 Additional Issues Relating to Speech Recognition Systems -- 2.4 HMM-Based Speech Recognition -- 2.4.1 Representing the Signal -- 2.4.2 The HMM for a Word Sequence -- 2.4.3 Searching through all Word Sequences -- References -- 3 The Problem of Robustness in Automatic Speech Recognition -- 3.1 Errors in Bayes Classification -- 3.1.1 Type 1 Condition: Mismatch Error -- 3.1.2 Type 2 Condition: Increased Bayes Error -- 3.2 Bayes Classification and ASR -- 3.2.1 All We Have is a Model: A Type 1 Condition -- 3.2.2 Intrinsic Interferences-Signal Components that are Unrelated to the Message: A Type 2 Condition -- 3.2.3 External Interferences-The Data are Noisy: Type 1 and Type 2 Conditions -- 3.3 External Influences on Speech Recordings -- 3.3.1 Signal Capture -- 3.3.2 Additive Corruptions -- 3.3.3 Reverberation -- 3.3.4 A Simplified Model of Signal Capture -- 3.4 The Effect of External Influences on Recognition -- 3.5 Improving Recognition under Adverse Conditions -- 3.5.1 Handling the Model Mismatch Error -- 3.5.2 Dealing with Intrinsic Variations in the Data -- 3.5.3 Dealing with Extrinsic Variations -- References -- Part Two SIGNAL ENHANCEMENT -- 4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement -- 4.1 Introduction -- 4.2 Signal Analysis and Synthesis 11.5.1 MMI Discriminative Training Criterion -- 11.5.2 MPE Discriminative Training Criterion -- 11.5.3 I-smoothing -- 11.5.4 MPE Implementation -- 11.6 Conclusion -- References -- 12 Factorial Models for Noise Robust Speech Recognition -- 12.1 Introduction -- 12.2 The Model-Based Approach -- 12.3 Signal Feature Domains -- 12.4 Interaction Models -- 12.4.1 Exact Interaction Model -- 12.4.2 Max Model -- 12.4.3 Log-Sum Model -- 12.4.4 Mel Interaction Model -- 12.5 Inference Methods -- 12.5.1 Max Model Inference -- 12.5.2 Parallel Model Combination -- 12.5.3 Vector Taylor Series Approaches -- 12.5.4 SNR-Dependent Approaches -- 12.6 Efficient Likelihood Evaluation in Factorial Models -- 12.6.1 Efficient Inference using the Max Model -- 12.6.2 Efficient Vector-Taylor Series Approaches -- 12.6.3 Band Quantization -- 12.7 Current Directions -- 12.7.1 Dynamic Noise Models for Robust ASR -- 12.7.2 Multi-Talker Speech Recognition using Graphical Models -- 12.7.3 Noise Robust ASR using Non-Negative Basis Representations -- References -- 13 Acoustic Model Training for Robust Speech Recognition -- 13.1 Introduction -- 13.2 Traditional Training Methods for Robust Speech Recognition -- 13.3 A Brief Overview of Speaker Adaptive Training -- 13.4 Feature-Space Noise Adaptive Training -- 13.4.1 Experiments using fNAT -- 13.5 Model-Space Noise Adaptive Training -- 13.6 Noise Adaptive Training using VTS Adaptation -- 13.6.1 Vector Taylor Series HMM Adaptation -- 13.6.2 Updating the Acoustic Model Parameters -- 13.6.3 Updating the Environmental Parameters -- 13.6.4 Implementation Details -- 13.6.5 Experiments using NAT -- 13.7 Discussion -- 13.7.1 Comparison of Training Algorithms -- 13.7.2 Comparison to Speaker Adaptive Training -- 13.7.3 Related Adaptive Training Methods -- 13.8 Conclusion -- References -- Part Five COMPENSATION FOR INFORMATION LOSS |
| Title | Techniques for Noise Robustness in Automatic Speech Recognition |
| URI | https://ieeexplore.ieee.org/servlet/opac?bknumber=8040145 https://www.perlego.com/book/1014249/techniques-for-noise-robustness-in-automatic-speech-recognition-pdf https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=1032417 https://learning.oreilly.com/library/view/~/9781118392669/?ar https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781118392669 https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781118392676&uid=none |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFH9iGwd6YjBENoYsxK2KyJfj5IS0qYC0qUy0TOMU2YmzRZvSqk6r_fk8O16agCaxA5eocR3rxT_nffn5PYCPcYk6By4Ftwgj5kaCUjfhue9SXvLQ5x7n0mTXP2fTaXJ1lV7YYorKlBNgdZ3c36fL_wo1tiHY-ujsE-DuBsUG_I2g4xVhx-sfGnF3axF_SMdqUiyMp4tKSR05vVaN4WfGs9Es2hyts6XE7lprbOOHtrvx87UOGRpfVqiX19Kevm_4eIZCzvhgTm6qW3XDxz-42fl5Sv--j8EEa_R9DIZBDQxP5JBas4rbMit_seE2rWuvX1usZpju-tfsYnI-bLZmiT59l-7ADouRZ-19nXz_edb5zVAlSlHNNEWfLA2hTd3V0WRzqSIVnwY0jGDE1S0KDxQsjbL1dNDwWcrVnbxeDI0Mhat1VfWUjflL2JP6BMo-PJP1Kxj1Mka-hs9bkAmCTAzIZAsyqWrSgUxakEkP5AO4_DKZn35zbQ0Ml8d6x9ctmSdYzqTH8ZYHVBZBlKMZKfAj48ITvqR-Tv0iQalFRZryMCiDRKIhGvmcJ2H4BnbrRS3fApESrXfUPkRI08iLcu5FJZrbZSFFKbyCOvChNz3Z5s7s16tsMLf_0InFDuzrqc3avxIUFPgqDhzYibbtFmgHyMO0Z2YwG5acTU5OdbrHyGcOHLdw2Cc3wZCow0eGPoIX2-X8Dnab1Voew_N801Rq9d4urd_h4Grj |
| linkProvider | ProQuest Ebooks |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Techniques+for+Noise+Robustness+in+Automatic+Speech+Recognition&rft.au=Tuomas+Virtanen%2C+Rita+Singh%2C+Bhiksha+Raj%2C+Tuomas+Virtanen%2C+Rita+Singh%2C+Bhiksha+Raj&rft.date=2012-01-01&rft.pub=Wiley&rft.isbn=9781118392669&rft_id=info:doi/10.1002%2F9781118392683&rft.externalDBID=YSPEL&rft.externalDocID=1014249 |
| thumbnail_l | http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.perlego.com%2Fbooks%2FRM_Books%2Fwiley_hlvwyirv%2F9781118392669.jpg |
| thumbnail_m | http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.safaribooksonline.com%2Flibrary%2Fcover%2F9781118392669 http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97811183%2F9781118392669.jpg http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97811183%2F9781118392676.jpg |

