Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection

The method of automatic lip motion recognition is an essential input for visual speech detection. It is a technological approach to demystify people who are hard to hear, deaf, and a challenge of silent communication in day-to-day life. However, the recognition process is a challenge in terms of pro...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Multimedia tools and applications Ročník 81; číslo 17; s. 24377 - 24397
Hlavní autori:	Birara, Muluken, Gebremeskel, Gebeyehu Belay
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York Springer US 01.07.2022 Springer Nature B.V
Predmet:	Algorithms Color Computer Communication Networks Computer Science Data Structures and Information Theory Edge detection Feature extraction Machine learning Motion perception Multimedia Information Systems Object recognition Physicians Special Purpose and Application-Based Systems Speech recognition Support vector machines Voice recognition Lips motion Average feature Saturated component Machine learning Speech recognition
ISSN:	1380-7501, 1573-7721
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The method of automatic lip motion recognition is an essential input for visual speech detection. It is a technological approach to demystify people who are hard to hear, deaf, and a challenge of silent communication in day-to-day life. However, the recognition process is a challenge in terms of pronunciation variation, speech speeds, gesture variation, color, makeup, the video quality of the camera, and the way of feature extraction. This paper proposed a solution for automatic lip motion recognition by identifying lip movements and characterizing their association with the spoken words for the Amharic language spoken using the information available in lip movements. The input video is converting into consecutive image frames. We use a Viola-Jones object detection algorithm to gain YIQ color space and apply the saturation components to detect lip images from the face area. Sobel’s edge detection and morphological image operations implement to identify and extract the exact contour of the lip. We applied ANN and SVM classifiers on averaging shape information features, and we gained 65.71% and 66.43% classification accuracies of ANN and SVM, respectively. The findings presented in the Amharic Speech Recognition is the newly introduced technology to enhance the academic and linguistic skills of hearing-problem people, health domain experts, physicians, researchers, etc. The future research work presents in the light of the findings.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-022-12399-w