Two-stream lightweight sign language transformer

Despite the recent progress of continuous sign language translation-based video, a variety of deep learning models are difficult to apply to the real-time translation in the limit computing resource. We present the two-stream lightweight sign transformer network model for recognizing and translating...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Machine vision and applications Ročník 33; číslo 5
Hlavní autoři:	Chen, Yuming, Mei, Xue, Qin, Xuan
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2022 Springer Nature B.V
Témata:	Artificial intelligence Chinese languages Communications Engineering Computer Science Computer vision Computerized corpora Conferences Corpus linguistics Deep learning Experiments Image Processing and Computer Vision Inference Language translation Lightweight Machine learning Networks Original Paper Pattern Recognition Real time Semantics Sentences Sign language Spatial data Transformers Translating Translation Vision systems Weather Sign language translation Transformer Multimodal fusion Lightweight networks
ISSN:	0932-8092, 1432-1769
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Despite the recent progress of continuous sign language translation-based video, a variety of deep learning models are difficult to apply to the real-time translation in the limit computing resource. We present the two-stream lightweight sign transformer network model for recognizing and translating continuous sign language. This lightweight framework can obtain both static spatial information and all body dynamic features of signer, and the transformer-style decoder architecture to real-time translate sentences from the spatio-temporal context around the signer. Additionally its attention mechanism focus on moving hands and mouth of signer, which is often crucial for semantic understanding of sign language. In this paper, we introduce the Chinese sign language corpus of the business scene which consists of 3080 videos of high quality. The Chinese sign language corpus of the business scene has enormous impetuses for further research on the Chinese sign language translation. Experiments are carried out the PHOENIX-Weather 2014T (Camgoz et al, in: Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR 2018), pp 7784–7793, 2018), Chinese Sign Language dataset Huang et al, in: The thirty-second AAAI conference on artificial intelligence (AAAI-18), pp 2257–2264, 2018) and our CSLBS, the proposed model outperforms the state-of-the-art in inference times and accuracy using only raw RGB and RGB difference frames as input.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0932-8092 1432-1769
DOI:	10.1007/s00138-022-01330-w