Spatial-temporal generative network based on deep long short-term memory autoencoder for hand skeleton data sequences reconstruction and recognition
Convolutional neural networks attract the highest research focus in the developing field of Hand Gesture Recognition (HGR). Nevertheless, these approaches presented a challenging task in adapting to time-series data. In skeleton-based HGR, extracting spatial–temporal information remains a challenge....
Uloženo v:
| Vydáno v: | Engineering applications of artificial intelligence Ročník 161; s. 112289 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
12.12.2025
|
| Témata: | |
| ISSN: | 0952-1976 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Convolutional neural networks attract the highest research focus in the developing field of Hand Gesture Recognition (HGR). Nevertheless, these approaches presented a challenging task in adapting to time-series data. In skeleton-based HGR, extracting spatial–temporal information remains a challenge. In recent times, recurrent neural networks have exhibited exceptional performance in detecting desired hand gestures by processing of varied length time-series data. Although they outperform traditional methods when huge training data is accessible, their effectiveness significantly diminishes when data availability is constrained. In this study, we introduce an unsupervised data augmentation network known as the Spatial-Temporal Generative Network (STGN), which reconstructs both the spatial and temporal information of the input sequences by leveraging a Deep Long Short-Term Memory Auto-Encoder (DLSTM-AE) network. Consequently, the DLSTM-AE combined with different Long Short-Term Memory (LSTM) network variations, forming an integrated network that can be trained end-to-end for HGR. Through experimentation conducted on the LeapGestureDB dataset (Leap Motion-based Gesture Dataset) and RIT dataset (Rochester Institute of Technology Hand Gesture Dataset), we prove that data reconstruction using STGN had a prominent effect on improving the accuracy of recognizing time-series based hand gestures. For all experiments, the best recognition results are achieved in the augmented dataset. Accuracies were improved on all tested LSTM networks from 2 to 10%. For reproducible research, the code is available at: https://github.com/AMEURsafa/STGN.
[Display omitted]
•A DLSTM-AE reconstructs spatial and temporal representation of the input sequences.•Generation of larger and reliable datasets, enhancing generalizability of the models.•HGR model: DLSTM-AE for data reconstruction and various RNN models for classification.•Effectiveness and efficiency model evaluation on two benchmark datasets. |
|---|---|
| ISSN: | 0952-1976 |
| DOI: | 10.1016/j.engappai.2025.112289 |