Character-level arabic text generation from sign language video using encoder–decoder model

Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text generation in English, but there are a few research works available for other languages. In this paper, we propose a novel encoding–decoding system th...

Full description

Saved in:

Bibliographic Details
Published in:	Displays Vol. 76; p. 102340
Main Authors:	Boukdir, Abdelbasset, Benaddy, Mohamed, Meslouhi, Othmane El, Kardouchi, Mustapha, Akhloufi, Moulay
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 01.01.2023
Subjects:	Arabic text Deep learning Gated Recurrent Unit Pose estimation Video caption Video caption Deep learning Gated Recurrent Unit Pose estimation Arabic text
ISSN:	0141-9382, 1872-7387
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text generation in English, but there are a few research works available for other languages. In this paper, we propose a novel encoding–decoding system that generates character-level Arabic sentences from isolated RGB videos of Moroccan sign language. The video sequence was encoded by a spatiotemporal feature extraction using pose estimation models, while the label text of the video is transmitted to a sequence of representative vectors. Both the features and the label vector are joined and treated by a decoder layer to derive a final prediction. We trained the proposed system on an isolated Moroccan Sign Language dataset (MoSLD), composed of RGB videos from 125 MoSL signs. The experimental results reveal that the proposed model attains the best performance under several evaluation metrics. •A method of generating character-level Arabic text from Moroccan sign language datasets has been proposed.•To the best of our knowledge, the proposed model is the first neural encoder–decoder model for Arabic video captioning.•Different landmark estimation schemes are used at the Arabic character level to improve the accuracy and interpretive performance of the results.
ISSN:	0141-9382 1872-7387
DOI:	10.1016/j.displa.2022.102340