Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition

This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on Im...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 13th International Electrical Engineering Congress (iEECON) s. 1 - 5
Hlavní autoři: Vijitkunsawat, Wuttichai, Sopin, Anan, Sathusen, Anusorn
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 05.03.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions.
AbstractList This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions.
Author Sopin, Anan
Vijitkunsawat, Wuttichai
Sathusen, Anusorn
Author_xml – sequence: 1
  givenname: Wuttichai
  surname: Vijitkunsawat
  fullname: Vijitkunsawat, Wuttichai
  email: wuttichai.v@mail.rmutk.ac.th
  organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand
– sequence: 2
  givenname: Anan
  surname: Sopin
  fullname: Sopin, Anan
  email: anan.s@mail.rmutk.ac.th
  organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand
– sequence: 3
  givenname: Anusorn
  surname: Sathusen
  fullname: Sathusen, Anusorn
  email: anusorn.s@mail.rmutk.ac.th
  organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand
BookMark eNpNkE1OwzAQhY0ECyi9AQtzgBQ7jhubHY3SUKm0Uum-mtjj1KJ1UH4W3IFDN4UisRrp03ufnuaOXIc6ICGPnE04Z_rJ53m2Xk0TpvgkZrGcDFClSsZXZKxTrYTgMhFaxrfk-60_dP5YWzjQed_6OlBXN3S7B0_ffRXoEkLVQ4V0g6augu-GyDNdhA6rBjofKropZtEMWrQ0W60oBHvu2CM0Hxc8R-j6Btsfcx72EMxAC2zP9L_3ntw4OLQ4vtwR2c7zbfYaLdfFIntZRl6LLtJQJgYx1shN7EoFVumSJ84ZmZZJai0aoxU4zlM1dcAcKM2UZMJBwqx0YkQefrUeEXefjR-2fu3-fiRO2hFkqQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/iEECON64081.2025.10987852
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331543952
EndPage 5
ExternalDocumentID 10987852
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3
IEDL.DBID RIE
IngestDate Thu May 29 05:57:28 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3
PageCount 5
ParticipantIDs ieee_primary_10987852
PublicationCentury 2000
PublicationDate 2025-March-5
PublicationDateYYYYMMDD 2025-03-05
PublicationDate_xml – month: 03
  year: 2025
  text: 2025-March-5
  day: 05
PublicationDecade 2020
PublicationTitle 2025 13th International Electrical Engineering Congress (iEECON)
PublicationTitleAbbrev iEECON
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.9009973
Snippet This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Convolutional neural networks
Deep learning
deep learning models
Electrical engineering
Feature extraction
Hands
landmark-based
Libraries
multi-modal
RGB-based
Robustness
Sign language
sign language recognition
Visualization
Title Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition
URI https://ieeexplore.ieee.org/document/10987852
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI5gQogTIIZ4K0hcM9ZHmoTjpm0gTdU0dthtSpOYVbAWjZVfwY8myToeBw7cKqt1okS1Y8ffZ4RuIm3C2AQZkUmWEGslFeEyiQhVQjKVAAjtSVyHLE35dCpGNVjdY2GMMb74zLTco7_L16WqXKrM_uE2QubUWtxtxpI1WGsXXde8mbd5zzX_S2Lr5WzgF9LW5v1fnVO84-jv_3PIA9T8huDh0ZdzOURbpjhCHx4uuyi1fMH9yuW5sD1z4slc5vgxfyrwsM4-4vGmLqgs7vBDTQlhFeHxoEM61nNp3E1TLAvtvtELuXyuxe5MWNkY3GvuFXNfIYAH1nlY6U-9TTTp9ybde1K3UyC5iFZEyCy2kw6FCVQIGZeaiyyIARRlWcysmVNKcAlBwHgCsg2SC8dnF4GM25pCdIwaRVmYE4QzYMA1UEMBYs0y6XjEVKCiNghrNOAUNd1Kzl7XhBmzzSKe_SE_R3tuv3xpF71AjdWyMpdoR72v8rflld_mT7mwrkY
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7QQMAJEEO8CRLXjrVN2oTjpr3EqKaxw25TmsSsgrVobPwKfjRJ1vE4cOBWWYoTJYq_2LU_I3QTKh0Q7aeeiNLIM1ZSekxEoUclF7GMALhyJK79OEnYeMwHZbG6q4XRWrvkM12zn-5fvirk0obKzA03HjKjxuJuUkKC-qpcaxtdl8yZt1nLtv-LiME54_oFtLYe8at3ioOO9t4_J91H1e8iPDz4gpcDtKHzQ_ThCmZnhRIvuL20kS5sXp14NBUZfsyectwv4494uM4MKvI73CtJIYwiPOw0vIbBLoWbSYJFruwYNRPz51JsX4VL44U7za186nIEcMfAh5H-1FtFo3Zr1Ox6ZUMFL-PhwuMiJWbRAde-DCBlQjGe-gRA0jglsTF0UnImwPdjFoGog2DcMtqFIEhdUQiPUCUvcn2McAoxMAVUUwCi4lRYJjHpy7AO3JgNOEFVu5OT1xVlxmS9iad_yK_QTnf00J_0e8n9Gdq1Z-cSveg5qizmS32BtuT7InubX7oj_wTxsLGN
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+13th+International+Electrical+Engineering+Congress+%28iEECON%29&rft.atitle=Multimodal+Fusion+for+Thai+Sign+Language+Recognition%3A+Integrating+RGB-Based+CNN+and+Landmark-Based+Features+for+Enhanced+Gesture+Recognition&rft.au=Vijitkunsawat%2C+Wuttichai&rft.au=Sopin%2C+Anan&rft.au=Sathusen%2C+Anusorn&rft.date=2025-03-05&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FiEECON64081.2025.10987852&rft.externalDocID=10987852