Triple Stack Deep Variational Autoencoder For Improved Hand Gesture Recognition
This paper proposes a novel approach for hand gesture recognition using a triple-stack deep variational autoencoder. By employing a VAE framework, we facilitate both efficient representation learning and the generation of meaningful latent spaces for gesture recognition and classification. This is a...
Saved in:
| Published in: | International Conference on Computing, Communication, and Networking Technologies (Online) pp. 1 - 7 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
24.06.2024
|
| Subjects: | |
| ISSN: | 2473-7674 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper proposes a novel approach for hand gesture recognition using a triple-stack deep variational autoencoder. By employing a VAE framework, we facilitate both efficient representation learning and the generation of meaningful latent spaces for gesture recognition and classification. This is an extension of the traditional VAE architecture that incorporates three layers of encoding and decoding with a spatial deep neural network. In a regular VAE, there is typically only one layer for both the encoder and decoder. By stacking three layers in both the encoder and decoder, a triple-stack deep VAE can learn more complex hierarchical representations of the input data. Each layer in the encoder extracts increasingly abstract features from the input, while each layer in the decoder reconstructs the input from these abstract representations. The performance of the proposed model is evaluated in terms of accuracy, precision, recall, and F1-score on six benchmark datasets: ASL Static, Massey University Dataset (MUGD), ASL Digit, NUS-2, Bengali Sign Language (BSL), and Hagrid-14 (HG-14) datasets. The results of the experiment show that the proposed 3S-DVE achieves an accuracy of {7 6 \%} (MUGD Set 1), {8 0 \%} (MUGD Set 2), {9 7 \%} (MUGD Set 3), 96% (MUGD Set 4), 86% (MUGD Set 5), {9 7 \%} (ASL Digit), {6 2 \%} (ASL Static), {7 4 \%} (NUS-II Dataset), {6 6 \%} (BSL), 76% (MUGD Set 1), 80% (MUGD Set 2), 97% (MUGD Set 3), 96% (MUGD Set 4), 86% (MUGD Set 5), 97% (ASL Digit), 62% (ASL Static), {7 4 \%} (NUS-II Dataset), {6 6 \%} (BSL), and 79% (HG-14), respectively, which is better compared to the state-of-the-art methods. |
|---|---|
| ISSN: | 2473-7674 |
| DOI: | 10.1109/ICCCNT61001.2024.10724125 |