Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition

This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on Im...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2025 13th International Electrical Engineering Congress (iEECON) s. 1 - 5
Hlavní autoři:	Vijitkunsawat, Wuttichai, Sopin, Anan, Sathusen, Anusorn
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 05.03.2025
Témata:	Accuracy Convolutional neural networks Deep learning deep learning models Electrical engineering Feature extraction Hands landmark-based Libraries multi-modal RGB-based Robustness Sign language sign language recognition Visualization
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions.
AbstractList	This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions.
Author	Sopin, Anan Vijitkunsawat, Wuttichai Sathusen, Anusorn
Author_xml	– sequence: 1 givenname: Wuttichai surname: Vijitkunsawat fullname: Vijitkunsawat, Wuttichai email: wuttichai.v@mail.rmutk.ac.th organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand – sequence: 2 givenname: Anan surname: Sopin fullname: Sopin, Anan email: anan.s@mail.rmutk.ac.th organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand – sequence: 3 givenname: Anusorn surname: Sathusen fullname: Sathusen, Anusorn email: anusorn.s@mail.rmutk.ac.th organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand
BookMark	eNpNkE1OwzAQhY0ECyi9AQtzgBQ7jhubHY3SUKm0Uum-mtjj1KJ1UH4W3IFDN4UisRrp03ufnuaOXIc6ICGPnE04Z_rJ53m2Xk0TpvgkZrGcDFClSsZXZKxTrYTgMhFaxrfk-60_dP5YWzjQed_6OlBXN3S7B0_ffRXoEkLVQ4V0g6augu-GyDNdhA6rBjofKropZtEMWrQ0W60oBHvu2CM0Hxc8R-j6Btsfcx72EMxAC2zP9L_3ntw4OLQ4vtwR2c7zbfYaLdfFIntZRl6LLtJQJgYx1shN7EoFVumSJ84ZmZZJai0aoxU4zlM1dcAcKM2UZMJBwqx0YkQefrUeEXefjR-2fu3-fiRO2hFkqQ
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/iEECON64081.2025.10987852
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798331543952
EndPage	5
ExternalDocumentID	10987852
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3
IEDL.DBID	RIE
IngestDate	Thu May 29 05:57:28 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3
PageCount	5
ParticipantIDs	ieee_primary_10987852
PublicationCentury	2000
PublicationDate	2025-March-5
PublicationDateYYYYMMDD	2025-03-05
PublicationDate_xml	– month: 03 year: 2025 text: 2025-March-5 day: 05
PublicationDecade	2020
PublicationTitle	2025 13th International Electrical Engineering Congress (iEECON)
PublicationTitleAbbrev	iEECON
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.9009973
Snippet	This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Accuracy Convolutional neural networks Deep learning deep learning models Electrical engineering Feature extraction Hands landmark-based Libraries multi-modal RGB-based Robustness Sign language sign language recognition Visualization
Title	Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition
URI	https://ieeexplore.ieee.org/document/10987852
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI5gQogTIIZ4K0hcM9ZHmoTjpm0gTdU0dthtSpOYVbAWjZVfwY8myToeBw7cKqt1okS1Y8ffZ4RuIm3C2AQZkUmWEGslFeEyiQhVQjKVAAjtSVyHLE35dCpGNVjdY2GMMb74zLTco7_L16WqXKrM_uE2QubUWtxtxpI1WGsXXde8mbd5zzX_S2Lr5WzgF9LW5v1fnVO84-jv_3PIA9T8huDh0ZdzOURbpjhCHx4uuyi1fMH9yuW5sD1z4slc5vgxfyrwsM4-4vGmLqgs7vBDTQlhFeHxoEM61nNp3E1TLAvtvtELuXyuxe5MWNkY3GvuFXNfIYAH1nlY6U-9TTTp9ybde1K3UyC5iFZEyCy2kw6FCVQIGZeaiyyIARRlWcysmVNKcAlBwHgCsg2SC8dnF4GM25pCdIwaRVmYE4QzYMA1UEMBYs0y6XjEVKCiNghrNOAUNd1Kzl7XhBmzzSKe_SE_R3tuv3xpF71AjdWyMpdoR72v8rflld_mT7mwrkY
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7QQMAJEEO8CRLXjrVN2oTjpr3EqKaxw25TmsSsgrVobPwKfjRJ1vE4cOBWWYoTJYq_2LU_I3QTKh0Q7aeeiNLIM1ZSekxEoUclF7GMALhyJK79OEnYeMwHZbG6q4XRWrvkM12zn-5fvirk0obKzA03HjKjxuJuUkKC-qpcaxtdl8yZt1nLtv-LiME54_oFtLYe8at3ioOO9t4_J91H1e8iPDz4gpcDtKHzQ_ThCmZnhRIvuL20kS5sXp14NBUZfsyectwv4494uM4MKvI73CtJIYwiPOw0vIbBLoWbSYJFruwYNRPz51JsX4VL44U7za186nIEcMfAh5H-1FtFo3Zr1Ox6ZUMFL-PhwuMiJWbRAde-DCBlQjGe-gRA0jglsTF0UnImwPdjFoGog2DcMtqFIEhdUQiPUCUvcn2McAoxMAVUUwCi4lRYJjHpy7AO3JgNOEFVu5OT1xVlxmS9iad_yK_QTnf00J_0e8n9Gdq1Z-cSveg5qizmS32BtuT7InubX7oj_wTxsLGN
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+13th+International+Electrical+Engineering+Congress+%28iEECON%29&rft.atitle=Multimodal+Fusion+for+Thai+Sign+Language+Recognition%3A+Integrating+RGB-Based+CNN+and+Landmark-Based+Features+for+Enhanced+Gesture+Recognition&rft.au=Vijitkunsawat%2C+Wuttichai&rft.au=Sopin%2C+Anan&rft.au=Sathusen%2C+Anusorn&rft.date=2025-03-05&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FiEECON64081.2025.10987852&rft.externalDocID=10987852