A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation

Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE International Conference and Workshops on Automatic Face and Gesture Recognition : FG s. 1 - 6
Hlavní autoři:	Xu, Fei, Chaudhary, Lipisha, Dong, Lu, Setlur, Srirangaraj, Govindaraju, Venu, Nwogu, Ifeoma
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 27.05.2024
Témata:	Avatars Measurement Sign language Three-dimensional displays Transformers Visualization Vocabulary
ISSN:	2770-8330
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: two-stream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformer-based Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work.
AbstractList	Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: two-stream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformer-based Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work.
Author	Dong, Lu Chaudhary, Lipisha Govindaraju, Venu Nwogu, Ifeoma Xu, Fei Setlur, Srirangaraj
Author_xml	– sequence: 1 givenname: Fei surname: Xu fullname: Xu, Fei organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 2 givenname: Lipisha surname: Chaudhary fullname: Chaudhary, Lipisha organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 3 givenname: Lu surname: Dong fullname: Dong, Lu organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 4 givenname: Srirangaraj surname: Setlur fullname: Setlur, Srirangaraj organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 5 givenname: Venu surname: Govindaraju fullname: Govindaraju, Venu organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 6 givenname: Ifeoma surname: Nwogu fullname: Nwogu, Ifeoma organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA
BookMark	eNo1UG1LwzAYjKLgnPsHCvkDrU-aNE0-1uE2YSA49etImyc1sqYl7YT9e4svn-64Ow7urslF6AIScscgZQz0_Wqd60yqNINMpAxyNRE4IwtdaMVz4Fpooc7JLCsKSBTncEUWw_AJABwYMM5npCnpsmt7E83ov5DuxqM90c7Rd2-xSx7MgJZujq0J9AX7iAOGcUp2YaCui7RsMfp6Mne-CXRrQnM0DdLy0H-YCke6xoDxJ39DLp05DLj4wzl5Wz2-LjfJ9nn9tCy3ic9AjImQdWYLgVhX1mpntMu5VZVxgjGma2QOpFGOKUQ76aySopbMKTmtz2Wh-Jzc_vZ6RNz30bcmnvb_3_Bv9fdbKg
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/FG59268.2024.10582020
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9798350394948
EISSN	2770-8330
EndPage	6
ExternalDocumentID	10582020
Genre	orig-research
GroupedDBID	6IE 6IF 6IK 6IL 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL
ID	FETCH-LOGICAL-i204t-46c2d74eecbdd9fa9f53d8baf41119ce1f06a8f18eedd8b1b64c61f8610556783
IEDL.DBID	RIE
ISICitedReferencesCount	2
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001270976600134&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:03:58 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-46c2d74eecbdd9fa9f53d8baf41119ce1f06a8f18eedd8b1b64c61f8610556783
PageCount	6
ParticipantIDs	ieee_primary_10582020
PublicationCentury	2000
PublicationDate	2024-May-27
PublicationDateYYYYMMDD	2024-05-27
PublicationDate_xml	– month: 05 year: 2024 text: 2024-May-27 day: 27
PublicationDecade	2020
PublicationTitle	IEEE International Conference and Workshops on Automatic Face and Gesture Recognition : FG
PublicationTitleAbbrev	FG
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003010133
Score	1.8863014
Snippet	Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Avatars Measurement Sign language Three-dimensional displays Transformers Visualization Vocabulary
Title	A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation
URI	https://ieeexplore.ieee.org/document/10582020
WOSCitedRecordID	wos001270976600134&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYmAqjyIoD3lgTUni91gqCgOqKl7qViX2ucqSoDZF4t9jO2krBgY2y7Jl2afzZ9_dd4fQbSYlsQR4xAyjESVSO50zJnJYyzRRTAFvik2IyUTOZmraktUDFwYAQvAZDHwz-PJNpdfeVOY0nDnASt0PfV8I3pC1tgYV4rOlEdKydJJY3Y0fmUq5j99K6WAz91cVlQAi4-4_lz9CvR0dD0-3QHOM9qA8Qd32_Yhb7VydosUQj3bJvLEPEfzGlcUfhYEqund4ZXAw2uOXEP_a0o7KFXYvV7zx3eDXYlHi59aOiYeejJtDjZsE1X58D72PH95GT1FbSCEq0pjWEeU6NYIC6NwYZTNlGTEyzyx1N53SkNiYZ9Im0u3D9Sc5p5onVvJQPVNIcoY6ZVXCOcIsV0LLWJosk5Tl7juSpECUFTKTRsT0AvX8wc0_m1wZ882Z9f_ov0SHXjzeH5-KK9Spl2u4Rgf6qy5Wy5sg4R_LbKci
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmugJHxjx2YPXxd2-tj0iETEiIYqGG9ntg-xl18Bi4r-37S4QDx68NU2btJ1Mv3ZmvhkAbhPOscGaBVRREhDMpdU5pQKLtVRiQYVmVbGJeDTi06kY12R1z4XRWvvgM91xTe_LV4VcOVOZ1XBqAQvZH_ouJQSFFV1rY1LBLl8axjVPJwrFXf-RCsRcBBcinfXsX3VUPIz0m_9cwCFobQl5cLyBmiOwo_Nj0KxfkLDWz-UJmHdhb5vOG7ogwW9YGPiRKV0E9xaxFPRme_jqI2Br4lG-hPbtCtfeG_iWzXM4rC2ZsOvouKkuYZWi2o1vgff-w6Q3COpSCkGGQlIGhEmkYqK1TJUSJhGGYsXTxBB71wmpIxOyhJuI233Y_ihlRLLIcObrZ8Ycn4JGXuT6DECailjykKsk4YSm9kMSIY2FiXnCVRySNmi5g5t9VtkyZuszO_-j_wbsDyYvw9nwafR8AQ6cqJx3HsWXoFEuVvoK7MmvMlsurr20fwCp1app
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+International+Conference+and+Workshops+on+Automatic+Face+and+Gesture+Recognition+%3A+FG&rft.atitle=A+Comparative+Study+of+Video-Based+Human+Representations+for+American+Sign+Language+Alphabet+Generation&rft.au=Xu%2C+Fei&rft.au=Chaudhary%2C+Lipisha&rft.au=Dong%2C+Lu&rft.au=Setlur%2C+Srirangaraj&rft.date=2024-05-27&rft.pub=IEEE&rft.eissn=2770-8330&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FFG59268.2024.10582020&rft.externalDocID=10582020