Two-stream lightweight sign language transformer

Despite the recent progress of continuous sign language translation-based video, a variety of deep learning models are difficult to apply to the real-time translation in the limit computing resource. We present the two-stream lightweight sign transformer network model for recognizing and translating...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Machine vision and applications Ročník 33; číslo 5
Hlavní autori: Chen, Yuming, Mei, Xue, Qin, Xuan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2022
Springer Nature B.V
Predmet:
ISSN:0932-8092, 1432-1769
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Despite the recent progress of continuous sign language translation-based video, a variety of deep learning models are difficult to apply to the real-time translation in the limit computing resource. We present the two-stream lightweight sign transformer network model for recognizing and translating continuous sign language. This lightweight framework can obtain both static spatial information and all body dynamic features of signer, and the transformer-style decoder architecture to real-time translate sentences from the spatio-temporal context around the signer. Additionally its attention mechanism focus on moving hands and mouth of signer, which is often crucial for semantic understanding of sign language. In this paper, we introduce the Chinese sign language corpus of the business scene which consists of 3080 videos of high quality. The Chinese sign language corpus of the business scene has enormous impetuses for further research on the Chinese sign language translation. Experiments are carried out the PHOENIX-Weather 2014T (Camgoz et al, in: Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR 2018), pp 7784–7793, 2018), Chinese Sign Language dataset Huang et al, in: The thirty-second AAAI conference on artificial intelligence (AAAI-18), pp 2257–2264, 2018) and our CSLBS, the proposed model outperforms the state-of-the-art in inference times and accuracy using only raw RGB and RGB difference frames as input.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0932-8092
1432-1769
DOI:10.1007/s00138-022-01330-w