Multi-Label Extreme Learning Machine (MLELMs) for Bangla Regional Speech Recognition

Extensive research has been conducted in the past to determine age, gender, and words spoken in Bangla speech, but no work has been conducted to identify the regional language spoken by the speaker in Bangla speech. Hence, in this study, we create a dataset containing 30 h of Bangla speech of seven...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Applied sciences Ročník 12; číslo 11; s. 5463
Hlavní autoři:	Hossain, Prommy Sultana, Chakrabarty, Amitabha, Kim, Kyuheon, Piran, Md. Jalil
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Basel MDPI AG 01.06.2022
Témata:	Accuracy Age Aging Anatomical systems Bangla regional speech classification Bengali Brain Classification Datasets Dialects Gender Grammar Labeling Language Learning Machinery Mel Frequency Energy Coefficients (MFECs) Multi-Label Extreme Learning machine (MLELMs) Neural networks Phonetics Regional dialects Regions Speech Speech recognition Speeches Stacked Convolution Autoencoder (SCAE) Voice recognition Bangladesh
ISSN:	2076-3417, 2076-3417
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Extensive research has been conducted in the past to determine age, gender, and words spoken in Bangla speech, but no work has been conducted to identify the regional language spoken by the speaker in Bangla speech. Hence, in this study, we create a dataset containing 30 h of Bangla speech of seven regional Bangla dialects with the goal of detecting synthesized Bangla speech and categorizing it. To categorize the regional language spoken by the speaker in the Bangla speech and determine its authenticity, the proposed model was created; a Stacked Convolutional Autoencoder (SCAE) and a Sequence of Multi-Label Extreme Learning machines (MLELM). SCAE creates a detailed feature map by identifying the spatial and temporal salient qualities from MFEC input data. The feature map is then sent to MLELM networks to generate soft labels and then hard labels. As aging generates physiological changes in the brain that alter the processing of aural information, the model took age class into account while generating dialect class labels, increasing classification accuracy from 85% to 95% without and with age class consideration, respectively. The classification accuracy for synthesized Bangla speech labels is 95%. The proposed methodology works well with English speaking audio sets as well.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app12115463