Speech Recognition Technology and Applications

Speech represents the most natural means of communication between humans. By using Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems, machines also become able to interact with humans using speech. This is of particular importance for building interactive robots or speech-enabled c...

Full description

Saved in:
Bibliographic Details
Main Author: Păiș, Vasile-Florian
Format: eBook
Language:English
Published: New York Nova Science Publishers, Incorporated 2022
Nova Science
Edition:1
Series:Computer Science, Technology and Applications
Subjects:
ISBN:9781685079291, 1685079296
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Intro -- Computer Science, Technologyand Applications -- Speech Recognition Technologyand Applications -- Contents -- Preface -- Chapter 1Building an Automatic Speech RecognitionSystem for a Low-Resource Language -- Abstract -- 1. Introduction -- 1.1. Romanian as a Low-Resource Language -- 2. State-of-the-Art Architectures -- 2.1. HMM-GMMBased Architectures -- 2.2. Deep Neural Networks Architectures -- 2.3. Hybrid Architectures -- 2.4. Language Models -- 3. Method -- 3.1. Corpora -- 3.2. Automatic Grapheme-to-Phoneme Conversion -- 3.3. Language Models -- 3.4. Data Augmentation -- 3.5. Speech-to-Text Architectures for Romanian -- 3.5.1. CMUSphinx -- 3.5.2. DeepSpeech -- 3.5.3. DeepSpeech 2 -- 3.5.4. Kaldi -- 3.6. Replicable Experiments with Containerization -- 4. Results -- 4.1. CMUSphinx -- 4.2. DeepSpeech -- 4.3. Kaldi -- 4.4. Data Augmentation and SpecAugment -- 5. Discussion -- 5.1. CMUSphinx -- 5.2. DeepSpeech -- 5.3. Kaldi -- Conclusion and FutureWork -- Acknowledgment -- References -- Chapter 2Self-Supervised Pre-Training in SpeechRecognition Systems -- Abstract -- 1. Introduction -- 2. Contrastive Representation Learning -- 2.1. Training Objectives -- 2.2. Essential Components -- 3. Pre-Trained ASR Architectures -- 3.1. Wav2Vec -- 3.2. VQ-Wav2Vec -- 3.3. Wav2Vec2 -- 4. Comparison with Non-Pre-TrainedModels -- 4.1. Dataset -- 4.2. Baseline Models -- 4.3. Pre-TrainedWav2Vec2 Models -- 4.4. Experimental Setup -- 4.5. Results -- 5. RELATE Integration -- Conclusion -- References -- Chapter 3The Impact of Speech RecognitionPerformance on Human-ComputerInteraction -- Abstract -- 1. Introduction -- 2. Architecture of a Speech-Based Dialogue System -- 3. Implementation Details -- 3.1. Automatic Speech Recognition -- 3.2. DialogueManager -- 3.3. Text-to-Speech -- 4. ASR Enhancements Leading to IncreasedPerformance of the Overall System
  • 3.1. Existing Datasets -- 3.2. Introducing a New Male Voice Dataset - RSS-Alex -- 4. FastSpeech TTS with BooleanMasking -- 5. Experiments -- 6. Results -- Conclusion -- FutureWork -- Acknowledgments -- References -- About the Editor -- About the Contributors -- Index -- Blank Page -- Blank Page
  • 4.1. End-to-End Neural ASR System -- 4.2. Fine-Tuning the ASR System with Domain-Specific Data -- 5. Impact of ASR Enhancements -- 5.1. Evaluation of RDM with a Fine-Tuned ASR System -- 5.2. Overall System Response Time -- Conclusion -- References -- Chapter 4The Role of Automatic Speech RecognitionSystems in Developing Medical Applications -- Abstract -- 1. Introduction -- 2. General Overview of ASR -- 3. NLP Applications in Medical Domain -- 3.1. Named Entity Recognition -- 3.2. Classification -- 3.3. Summarization -- 4. ASR Applications in Medical Domain -- 4.1. Digital Scribes for Medical Domain -- 4.1.1. Challenges of Developing Digital Scribes for theMedical Domain -- 4.2. Software and Platforms with ASR-Based Capabilities intheMedical Domain -- 4.2.1. Case Study: AmazonMedical -- Amazon Transcribe Medical -- Amazon Comprehend Medical -- 4.3. ASR and Vocal Biomarkers -- 4.4. Medical IOT and ASR -- Conclusion -- References -- Chapter 5Punctuation Recovery for RomanianTranscribed Documents -- Abstract -- 1. Introduction -- 2. Punctuation in Romanian Language -- 3. Corpora and Resources -- 4. Algorithms -- 5. Results -- Conclusion -- References -- Chapter 6Linguistic Linked Open Datafor Speech Processing -- Abstract -- 1. Introduction -- 2. Linguistic Linked Open Data -- 3. Romanian Resources as Linguistic Linked OpenData -- 4. LLOD Resources for Speech Processing -- 5. Romanian LLOD Resources for SpeechProcessing -- 5.1. The RoLEX Lexicon -- 5.2. The RTASC Corpus -- 6. ExploitingMultiple Resources for AdvancedUsage Scenarios -- Conclusion -- References -- Chapter 7Transformer-Based RomanianText-to-Speech System Using BooleanMasking for Improved Prosody -- Abstract -- 1. Introduction -- 2. RelatedWork -- 2.1. Deep Neural Models for Speech Synthesis -- 2.2. Speech Synthesis for the Romanian Language -- 3. Datasets for the Romanian Language