Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition

This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on Im...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2025 13th International Electrical Engineering Congress (iEECON) s. 1 - 5
Hlavní autori: Vijitkunsawat, Wuttichai, Sopin, Anan, Sathusen, Anusorn
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 05.03.2025
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions.
AbstractList This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial features with landmark-based skeletal information. The proposed model employs ResNet-50, a deep Convolutional Neural Network (CNN) pre-trained on ImageNet, to extract detailed spatial features from RGB images, capturing the visual characteristics of hand gestures. In parallel, Google's MediaPipe library is used to obtain 2D hand landmarks, providing a coordinate representation of hand structures through x, y coordinate data. These RGB and landmark-based features are then fused to create a comprehensive representation that effectively captures both the appearance and structural details of gestures. The model's performance was rigorously evaluated using a TSL dataset, achieving an accuracy of 94.6%, precision of 0.937, recall of 0.921, and an F1-score of 0.929, significantly outperforming traditional machine learning models and standalone CNN architectures: VGG-16 and ResNet-50 alone. This study highlights the advantages of integrating spatial and skeletal features to enhance accuracy and robustness, especially in applications requiring precise recognition of complex hand gestures under varied conditions.
Author Sopin, Anan
Vijitkunsawat, Wuttichai
Sathusen, Anusorn
Author_xml – sequence: 1
  givenname: Wuttichai
  surname: Vijitkunsawat
  fullname: Vijitkunsawat, Wuttichai
  email: wuttichai.v@mail.rmutk.ac.th
  organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand
– sequence: 2
  givenname: Anan
  surname: Sopin
  fullname: Sopin, Anan
  email: anan.s@mail.rmutk.ac.th
  organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand
– sequence: 3
  givenname: Anusorn
  surname: Sathusen
  fullname: Sathusen, Anusorn
  email: anusorn.s@mail.rmutk.ac.th
  organization: Rajamangala University of Technology Krungthep,Electronics and Telecommunication Engineering,Bangkok,Thailand
BookMark eNpNkE1OwzAQhY0ECyi9AQtzgBQ7jhubHY3SUKm0Uum-mtjj1KJ1UH4W3IFDN4UisRrp03ufnuaOXIc6ICGPnE04Z_rJ53m2Xk0TpvgkZrGcDFClSsZXZKxTrYTgMhFaxrfk-60_dP5YWzjQed_6OlBXN3S7B0_ffRXoEkLVQ4V0g6augu-GyDNdhA6rBjofKropZtEMWrQ0W60oBHvu2CM0Hxc8R-j6Btsfcx72EMxAC2zP9L_3ntw4OLQ4vtwR2c7zbfYaLdfFIntZRl6LLtJQJgYx1shN7EoFVumSJ84ZmZZJai0aoxU4zlM1dcAcKM2UZMJBwqx0YkQefrUeEXefjR-2fu3-fiRO2hFkqQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/iEECON64081.2025.10987852
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331543952
EndPage 5
ExternalDocumentID 10987852
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3
IEDL.DBID RIE
IngestDate Thu May 29 05:57:28 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-9ab4cee29e1c2fb8ad89b14ffc57b47ddecc98af11786fa0fa8908503fa40d5f3
PageCount 5
ParticipantIDs ieee_primary_10987852
PublicationCentury 2000
PublicationDate 2025-March-5
PublicationDateYYYYMMDD 2025-03-05
PublicationDate_xml – month: 03
  year: 2025
  text: 2025-March-5
  day: 05
PublicationDecade 2020
PublicationTitle 2025 13th International Electrical Engineering Congress (iEECON)
PublicationTitleAbbrev iEECON
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.9011035
Snippet This paper introduces a multimodal fusion model designed to improve the recognition of Thai Sign Language (TSL) gestures by combining RGB-based spatial...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Convolutional neural networks
Deep learning
deep learning models
Electrical engineering
Feature extraction
Hands
landmark-based
Libraries
multi-modal
RGB-based
Robustness
Sign language
sign language recognition
Visualization
Title Multimodal Fusion for Thai Sign Language Recognition: Integrating RGB-Based CNN and Landmark-Based Features for Enhanced Gesture Recognition
URI https://ieeexplore.ieee.org/document/10987852
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWgQogTIIrYZSSuKVns2ObYqi1IKKpKD71V3gIRNEGl4Sv4aMZuynLgwC0exZPIo8yWmTcIXQnKpDScBFJbB6ptSSAUlUFiE-38bwoLP2yCZRmfTsWoaVb3vTDWWl98Zjvu0v_LN5WuXaoMvnCIkDkFjbvJWLpq1tpGlw1u5nXRd8P_UgJWDgK_mHbW9_-anOINx2D3n4_cQ-3vFjw8-jIu-2jDlgfow7fLzisjX_CgdnkuDD4nnjzJAj8UjyW-b7KPeLyuC6rKG3zXQEIAIzwedoMuWC6De1mGZWncHjOXi-eG7HzCGmJwz7lfPvkKATwE4wHUn3zbaDLoT3q3QTNOIShEsgyEVAReOhY20nGuOEhIqIjkuaZMEQZqTmvBZR5FjKe5DHPJhcOzS3JJQkPz5BC1yqq0Rwhbag3XKegnYgjjSlpGdaxNEiqpQsGOUdud5Ox1BZgxWx_iyR_0U7Tj5OVLu-gZai0XtT1HW_p9WbwtLryYPwEoVa3m
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELVQQcAJEEXsGIlrSha7sTm26iZKVJUcequ80giaoNLwFXw0tpuyHDhwi0fyOPIos2XeDAA3FMeMSYI8JpRtqq2QRzlmXqQiYf1vbBZu2EScJGQyoaMKrO6wMEopV3ymGvbR_cuXhShtqsx84SZCJtho3E2MUOiv4Frb4LrqnHmbdez4vyYyds6EfiFurHf8mp3iTEd375-H7oP6NwgPjr7MywHYUPkh-HCA2Xkh2QvsljbTBY3XCdMZy-Bj9pTDYZV_hON1ZVCR38FB1RTCMILjXstrGdslYTtJIMul3SPnbPFcka1XWJoo3HHu5DNXIwB7xnwY6k--dZB2O2m771UDFbyMRkuPMo7MS4dUBSLUnBgZUR4grQWOOYqNohOCEqaDICZNzXzNCLUd7SLNkC-xjo5ALS9ydQygwkoS0TQaCkkUE85UjEUoZORzxn0an4C6vcnp66plxnR9iad_0K_ATj99GE6Hg-T-DOxa2blCL3wOastFqS7AlnhfZm-LSyfyT1tlsS0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+13th+International+Electrical+Engineering+Congress+%28iEECON%29&rft.atitle=Multimodal+Fusion+for+Thai+Sign+Language+Recognition%3A+Integrating+RGB-Based+CNN+and+Landmark-Based+Features+for+Enhanced+Gesture+Recognition&rft.au=Vijitkunsawat%2C+Wuttichai&rft.au=Sopin%2C+Anan&rft.au=Sathusen%2C+Anusorn&rft.date=2025-03-05&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FiEECON64081.2025.10987852&rft.externalDocID=10987852