Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition
This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on Im...
Saved in:
| Published in: | 2025 13th International Electrical Engineering Congress (iEECON) pp. 1 - 5 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
05.03.2025
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions. |
|---|---|
| AbstractList | This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions. |
| Author | Sopin, Anan Vijitkunsawat, Wuttichai Sathusen, Anusorn |
| Author_xml | – sequence: 1 givenname: Wuttichai surname: Vijitkunsawat fullname: Vijitkunsawat, Wuttichai email: wuttichai.v@mail.rmutk.ac.th organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand – sequence: 2 givenname: Anan surname: Sopin fullname: Sopin, Anan email: anan.s@mail.rmutk.ac.th organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand – sequence: 3 givenname: Anusorn surname: Sathusen fullname: Sathusen, Anusorn email: anusorn.s@mail.rmutk.ac.th organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand |
| BookMark | eNpNkE1OwzAQhY0ECyi9AQtzgBQ7jhubHY3SUKm0Uum-mtjj1KJ1UH4W3IFDN4UisRrp03ufnuaOXIc6ICGPnE04Z_rJ53m2Xk0TpvgkZrGcDFClSsZXZKxTrYTgMhFaxrfk-60_dP5YWzjQed_6OlBXN3S7B0_ffRXoEkLVQ4V0g6augu-GyDNdhA6rBjofKropZtEMWrQ0W60oBHvu2CM0Hxc8R-j6Btsfcx72EMxAC2zP9L_3ntw4OLQ4vtwR2c7zbfYaLdfFIntZRl6LLtJQJgYx1shN7EoFVumSJ84ZmZZJai0aoxU4zlM1dcAcKM2UZMJBwqx0YkQefrUeEXefjR-2fu3-fiRO2hFkqQ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/iEECON64081.2025.10987852 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331543952 |
| EndPage | 5 |
| ExternalDocumentID | 10987852 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3 |
| IEDL.DBID | RIE |
| IngestDate | Thu May 29 05:57:28 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10987852 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-March-5 |
| PublicationDateYYYYMMDD | 2025-03-05 |
| PublicationDate_xml | – month: 03 year: 2025 text: 2025-March-5 day: 05 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 13th International Electrical Engineering Congress (iEECON) |
| PublicationTitleAbbrev | iEECON |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.9011035 |
| Snippet | This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Accuracy Convolutional neural networks Deep learning deep learning models Electrical engineering Feature extraction Hands landmark-based Libraries multi-modal RGB-based Robustness Sign language sign language recognition Visualization |
| Title | Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition |
| URI | https://ieeexplore.ieee.org/document/10987852 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI5gQogTIIZ4K0hcO9ataROOm7aBhKpp7LDblMYJq2DpNFZ-BT8aJ-t4HDhwq6zWrZLGX-z4swm5QcQQEiR6qgYgcFzJAP-bOGDMMAgNcxXLfbOJJE35ZCKGFVndc2G01j75TDfcpT_Lh0KVLlSGKxw9ZM7Q4m4nSbwma-2S66pu5m3ec83_4ghRDh2_Fmts7v_VOcUDR3__n688IPVvCh4dfoHLIdnS9oh8eLrsvAD5Svuli3NR3HPS8Uzm9Cl_tvSxij7S0SYvqLB39KEqCYGK6GjQCTqIXEC7aUqlBfcMzOXypRK7PWGJPrjX3LMznyFABwgeKP2pt07G_d64ex9U7RSCXLRXgZBZhB_dEjpULZNxCVxkYWSMYkkWJWjmlBJcmjBMeGxk00guXD27tpFRE5hpH5OaLaw-IdRoDgx1hYrjms-kiEWUgWICVQJagFNSdyM5XawLZkw3g3j2h_yc7Ln58qld7ILUVstSX5Id9b7K35ZXfpo_Aa0prUo |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7QQMAJEEO8CRLXjrVL2oTjpr3EqKbRw25TmgerYB0aK7-CH42TdTwOHLhVluJWSeMvdvzZCN0AYnChBHiqRinPciU9-G9Cj1JDlW-orVjumk1EcczGYz4syeqOC6O1dslnumYf3V2-msvChspgh4OHzChY3E1KSFBf0bW20XVZOfM2a9v2fyEBnAPXL6C19YhfvVMcdHT2_vnSfVT9JuHh4Re8HKANnR-iD0eYnc2VeMGdwka6MJw6cTIVGX7MnnI8KOOPeLTODJrnd7hfFoUARXjUbXpNwC6FW3GMRa7sGDUTi-dSbE-FBXjhTnM7n7ocAdwF-ADpT71VlHTaSavnlQ0VvIw3lh4XKYGPDrj2ZWBSJhTjqU-MkTRKSQSGTkrOhPH9iIVG1I1g3Fa0axhB6oqaxhGq5PNcHyNsNFMUdPmSwa5PBQ85SZWkHFQqsAEnqGpncvK6KpkxWU_i6R_yK7TTSx4Gk0E_vj9Du3btXKIXPUeV5aLQF2hLvi-zt8WlW_JP142wkQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+13th+International+Electrical+Engineering+Congress+%28iEECON%29&rft.atitle=Multimodal+Fusion+for+Thai+Sign+Language+Recognition%3A+Integrating+RGB-Based+CNN+and+Landmark-Based+Features+for+Enhanced+Gesture+Recognition&rft.au=Vijitkunsawat%2C+Wuttichai&rft.au=Sopin%2C+Anan&rft.au=Sathusen%2C+Anusorn&rft.date=2025-03-05&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FiEECON64081.2025.10987852&rft.externalDocID=10987852 |