Spatial-temporal generative network based on deep long short-term memory autoencoder for hand skeleton data sequences reconstruction and recognition
Convolutional neural networks attract the highest research focus in the developing field of Hand Gesture Recognition (HGR). Nevertheless, these approaches presented a challenging task in adapting to time-series data. In skeleton-based HGR, extracting spatial–temporal information remains a challenge....
Saved in:
| Published in: | Engineering applications of artificial intelligence Vol. 161; p. 112289 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
12.12.2025
|
| Subjects: | |
| ISSN: | 0952-1976 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Convolutional neural networks attract the highest research focus in the developing field of Hand Gesture Recognition (HGR). Nevertheless, these approaches presented a challenging task in adapting to time-series data. In skeleton-based HGR, extracting spatial–temporal information remains a challenge. In recent times, recurrent neural networks have exhibited exceptional performance in detecting desired hand gestures by processing of varied length time-series data. Although they outperform traditional methods when huge training data is accessible, their effectiveness significantly diminishes when data availability is constrained. In this study, we introduce an unsupervised data augmentation network known as the Spatial-Temporal Generative Network (STGN), which reconstructs both the spatial and temporal information of the input sequences by leveraging a Deep Long Short-Term Memory Auto-Encoder (DLSTM-AE) network. Consequently, the DLSTM-AE combined with different Long Short-Term Memory (LSTM) network variations, forming an integrated network that can be trained end-to-end for HGR. Through experimentation conducted on the LeapGestureDB dataset (Leap Motion-based Gesture Dataset) and RIT dataset (Rochester Institute of Technology Hand Gesture Dataset), we prove that data reconstruction using STGN had a prominent effect on improving the accuracy of recognizing time-series based hand gestures. For all experiments, the best recognition results are achieved in the augmented dataset. Accuracies were improved on all tested LSTM networks from 2 to 10%. For reproducible research, the code is available at: https://github.com/AMEURsafa/STGN.
[Display omitted]
•A DLSTM-AE reconstructs spatial and temporal representation of the input sequences.•Generation of larger and reliable datasets, enhancing generalizability of the models.•HGR model: DLSTM-AE for data reconstruction and various RNN models for classification.•Effectiveness and efficiency model evaluation on two benchmark datasets. |
|---|---|
| ISSN: | 0952-1976 |
| DOI: | 10.1016/j.engappai.2025.112289 |