Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition
This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on Im...
Uloženo v:
| Vydáno v: | 2025 13th International Electrical Engineering Congress (iEECON) s. 1 - 5 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
05.03.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions. |
|---|---|
| DOI: | 10.1109/iEECON64081.2025.10987852 |