A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation
Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of...
Uloženo v:
| Vydáno v: | IEEE International Conference and Workshops on Automatic Face and Gesture Recognition : FG s. 1 - 6 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
27.05.2024
|
| Témata: | |
| ISSN: | 2770-8330 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: two-stream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformer-based Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work. |
|---|---|
| AbstractList | Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: two-stream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformer-based Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work. |
| Author | Dong, Lu Chaudhary, Lipisha Govindaraju, Venu Nwogu, Ifeoma Xu, Fei Setlur, Srirangaraj |
| Author_xml | – sequence: 1 givenname: Fei surname: Xu fullname: Xu, Fei organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 2 givenname: Lipisha surname: Chaudhary fullname: Chaudhary, Lipisha organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 3 givenname: Lu surname: Dong fullname: Dong, Lu organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 4 givenname: Srirangaraj surname: Setlur fullname: Setlur, Srirangaraj organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 5 givenname: Venu surname: Govindaraju fullname: Govindaraju, Venu organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA – sequence: 6 givenname: Ifeoma surname: Nwogu fullname: Nwogu, Ifeoma organization: University at Buffalo,Department of Computer Science and Engineering,Buffalo,New York,USA |
| BookMark | eNo1UG1LwzAYjKLgnPsHCvkDrU-aNE0-1uE2YSA49etImyc1sqYl7YT9e4svn-64Ow7urslF6AIScscgZQz0_Wqd60yqNINMpAxyNRE4IwtdaMVz4Fpooc7JLCsKSBTncEUWw_AJABwYMM5npCnpsmt7E83ov5DuxqM90c7Rd2-xSx7MgJZujq0J9AX7iAOGcUp2YaCui7RsMfp6Mne-CXRrQnM0DdLy0H-YCke6xoDxJ39DLp05DLj4wzl5Wz2-LjfJ9nn9tCy3ic9AjImQdWYLgVhX1mpntMu5VZVxgjGma2QOpFGOKUQ76aySopbMKTmtz2Wh-Jzc_vZ6RNz30bcmnvb_3_Bv9fdbKg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FG59268.2024.10582020 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 9798350394948 |
| EISSN | 2770-8330 |
| EndPage | 6 |
| ExternalDocumentID | 10582020 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-i204t-46c2d74eecbdd9fa9f53d8baf41119ce1f06a8f18eedd8b1b64c61f8610556783 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001270976600134&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:03:58 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-46c2d74eecbdd9fa9f53d8baf41119ce1f06a8f18eedd8b1b64c61f8610556783 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_10582020 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-May-27 |
| PublicationDateYYYYMMDD | 2024-05-27 |
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-May-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE International Conference and Workshops on Automatic Face and Gesture Recognition : FG |
| PublicationTitleAbbrev | FG |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003010133 |
| Score | 1.8863014 |
| Snippet | Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Avatars Measurement Sign language Three-dimensional displays Transformers Visualization Vocabulary |
| Title | A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation |
| URI | https://ieeexplore.ieee.org/document/10582020 |
| WOSCitedRecordID | wos001270976600134&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYmAqjyIoD3lgTUni91gqCgOqKl7qViX2ucqSoDZF4t9jO2krBgY2y7Jl2afzZ9_dd4fQbSYlsQR4xAyjESVSO50zJnJYyzRRTAFvik2IyUTOZmraktUDFwYAQvAZDHwz-PJNpdfeVOY0nDnASt0PfV8I3pC1tgYV4rOlEdKydJJY3Y0fmUq5j99K6WAz91cVlQAi4-4_lz9CvR0dD0-3QHOM9qA8Qd32_Yhb7VydosUQj3bJvLEPEfzGlcUfhYEqund4ZXAw2uOXEP_a0o7KFXYvV7zx3eDXYlHi59aOiYeejJtDjZsE1X58D72PH95GT1FbSCEq0pjWEeU6NYIC6NwYZTNlGTEyzyx1N53SkNiYZ9Im0u3D9Sc5p5onVvJQPVNIcoY6ZVXCOcIsV0LLWJosk5Tl7juSpECUFTKTRsT0AvX8wc0_m1wZ882Z9f_ov0SHXjzeH5-KK9Spl2u4Rgf6qy5Wy5sg4R_LbKci |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmugJHxjx2YPXxd2-tj0iETEiIYqGG9ntg-xl18Bi4r-37S4QDx68NU2btJ1Mv3ZmvhkAbhPOscGaBVRREhDMpdU5pQKLtVRiQYVmVbGJeDTi06kY12R1z4XRWvvgM91xTe_LV4VcOVOZ1XBqAQvZH_ouJQSFFV1rY1LBLl8axjVPJwrFXf-RCsRcBBcinfXsX3VUPIz0m_9cwCFobQl5cLyBmiOwo_Nj0KxfkLDWz-UJmHdhb5vOG7ogwW9YGPiRKV0E9xaxFPRme_jqI2Br4lG-hPbtCtfeG_iWzXM4rC2ZsOvouKkuYZWi2o1vgff-w6Q3COpSCkGGQlIGhEmkYqK1TJUSJhGGYsXTxBB71wmpIxOyhJuI233Y_ihlRLLIcObrZ8Ycn4JGXuT6DECailjykKsk4YSm9kMSIY2FiXnCVRySNmi5g5t9VtkyZuszO_-j_wbsDyYvw9nwafR8AQ6cqJx3HsWXoFEuVvoK7MmvMlsurr20fwCp1app |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+International+Conference+and+Workshops+on+Automatic+Face+and+Gesture+Recognition+%3A+FG&rft.atitle=A+Comparative+Study+of+Video-Based+Human+Representations+for+American+Sign+Language+Alphabet+Generation&rft.au=Xu%2C+Fei&rft.au=Chaudhary%2C+Lipisha&rft.au=Dong%2C+Lu&rft.au=Setlur%2C+Srirangaraj&rft.date=2024-05-27&rft.pub=IEEE&rft.eissn=2770-8330&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FFG59268.2024.10582020&rft.externalDocID=10582020 |