Mini-batch sample selection strategies for deep learning based speech recognition

•Deep learning based speech recognition is studied.•4 sample selection strategies are proposed for mini-batch gradient descent.•Proposed strategies are designed to better represent variations in speech datasets.•Proposed strategies use meta information of the speech corpuses.•Experimental results sh...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Applied acoustics Ročník 171; s. 107573
Hlavní autoři: Dokuz, Yesim, Tufekci, Zekeriya
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.01.2021
Témata:
ISSN:0003-682X, 1872-910X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Deep learning based speech recognition is studied.•4 sample selection strategies are proposed for mini-batch gradient descent.•Proposed strategies are designed to better represent variations in speech datasets.•Proposed strategies use meta information of the speech corpuses.•Experimental results show the benefits of proposed strategies. With the use of deep learning technologies, speech recognition systems gained more success and human–computer interactions became more prevalent. Deep learning based speech recognition systems are getting more attention and are having tremendous success in all areas of speech recognition, such as voice search, mobile communication, and personal digital assistance. However, speech recognition is still challenging due to hardness of adapting new languages, difficulty in handling variations in speech datasets, and overcoming distorting factors. Deep learning systems have the ability to overcome these challenges using high-level abstractions in the datasets by using a deep graph with multiple processing layers using training algorithms, such as gradient descent optimization. In this study, a variant of gradient descent optimization, mini-batch gradient descent is used. We proposed four strategies for selecting mini-batch samples to represent variations of each feature in the dataset for speech recognition tasks to increase model performance of deep learning based speech recognition. For this purpose, gender and accent adjusted strategies are proposed for selecting mini-batch samples. The experiments show that proposed strategies perform better in comparison with standard mini-batch sample selection strategy.
ISSN:0003-682X
1872-910X
DOI:10.1016/j.apacoust.2020.107573