Part-Aware Unified Representation of Language and Skeleton for Zero-Shot Action Recognition
While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this pa-per, we argue that relying solely on aligning label-level se-mantics and global skeleton features is insufficient to effectively t...
Uloženo v:
| Vydáno v: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 18761 - 18770 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
16.06.2024
|
| Témata: | |
| ISSN: | 1063-6919 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this pa-per, we argue that relying solely on aligning label-level se-mantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we intro-duce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to gen-erate aligned textual and visual representations across dif-ferent levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skele-ton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzhl/PURLS. |
|---|---|
| AbstractList | While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this pa-per, we argue that relying solely on aligning label-level se-mantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we intro-duce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to gen-erate aligned textual and visual representations across dif-ferent levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skele-ton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzhl/PURLS. |
| Author | Zhu, Anqi Gong, Mingming Ke, Qiuhong Bailey, James |
| Author_xml | – sequence: 1 givenname: Anqi surname: Zhu fullname: Zhu, Anqi email: azzh1@student.unimelb.edu.au organization: The University of Melbourne,Parkville,VIC,Australia,3052 – sequence: 2 givenname: Qiuhong surname: Ke fullname: Ke, Qiuhong email: qiuhong.ke@monash.edu organization: Monash University,Clayton,VIC,Australia,3800 – sequence: 3 givenname: Mingming surname: Gong fullname: Gong, Mingming email: mingming.gong@unimelb.edu.au organization: The University of Melbourne,Parkville,VIC,Australia,3052 – sequence: 4 givenname: James surname: Bailey fullname: Bailey, James email: baileyj@unimelb.edu.au organization: The University of Melbourne,Parkville,VIC,Australia,3052 |
| BookMark | eNotj91KAzEUhKMoWGvfoBd5ga0nm5xkc1mKf1CwtNYLvShpcrZGa1J2V8S3d_25moEZhvnO2UnKiRgbC5gIAfZy9rhYYmmknJRQqgkIY_CIjayxlUSQKAH0MRsI0LLQVtgzNmrbVwCQpRDaVgP2vHBNV0w_XUN8nWIdKfAlHRpqKXWuiznxXPO5S7sPtyPuUuCrN9pT1wd1bvgTNblYveSOT_1ve0k-71L88RfstHb7lkb_OmTr66uH2W0xv7-5m03nRRRGd4Vy0ikdaqw8Vf014QP50iskZYLWtXboAlgdhEe0gFstrTJoUWx7hqDkkI3_diMRbQ5NfHfN16aHRo2A8hsro1X6 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52733.2024.01775 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 9798350353006 |
| EISSN | 1063-6919 |
| EndPage | 18770 |
| ExternalDocumentID | 10656505 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i176t-4a3a46df58ce80031cdec2c45e47d66f6a5ad096d1c55905b639475951b169d43 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 14 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001342515502010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:00:57 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i176t-4a3a46df58ce80031cdec2c45e47d66f6a5ad096d1c55905b639475951b169d43 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_10656505 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-June-16 |
| PublicationDateYYYYMMDD | 2024-06-16 |
| PublicationDate_xml | – month: 06 year: 2024 text: 2024-June-16 day: 16 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.3779042 |
| Snippet | While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 18761 |
| SubjectTerms | action recognition Computer vision contrastive learning large language model Large language models Natural languages Pattern recognition representation learning Skeleton skeleton action recognition Source coding Visualization zero-shot learning |
| Title | Part-Aware Unified Representation of Language and Skeleton for Zero-Shot Action Recognition |
| URI | https://ieeexplore.ieee.org/document/10656505 |
| WOSCitedRecordID | wos001342515502010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWg4sCJrYhdPnB1Seol9rGqqDigKiqLKjhUjhdRISUoTeH3Gbtp4cKBWxYlkcYezZvJezMIXVvBXKq9J1IVhjDJPVGMJ8RYlxmvtbaaxWET2Xgsp1OVt2L1qIVxzkXymeuFw_gv31ZmGUpl4OGAPnjoWLqdZWIl1toUVCikMkLJVh6XJupm-JxPQn8xCmlgn_Vg7wU24a8hKjGGjPb--fV91P1R4-F8E2cO0JYrD9FeCx9x65yLI_SawzYggy9dOwxQ0oe7k0h0bfVFJa48vm8LlFiX8PA7RB1AfxigK35xdUUe3qoGD6LYAU_W5KKq7KKn0e3j8I60sxPIPM1EQ5immgnruTROBs8F45u-YdyxzArhhebaQvpiUwM5RcILQCqh9R9PC7CiZfQYdcqqdCcIWyWktjSRmqasoBJO4A19C7awUjl9irrBWLOPVXuM2dpOZ39cP0e7YT0C3yoVF6jT1Et3iXbMZzNf1FdxUb8BcmakCw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF5EBT3VR8W3e_C6NY_dzeZYiqViLaFWKXoom31gERJpU_37zm7T6sWDtzxIArM7zDeT75tB6FpzakJpLRFprggVzJKUsoAobRJlpZRaUj9sIhkMxHicZrVY3WthjDGefGZa7tD_y9elWrhSGXg4oA_mOpZuMUqjYCnXWpdUYkhmeCpqgVwYpDed52zoOozFkAhGtAW7z_EJf41R8VGk2_jn9_dQ80ePh7N1pNlHG6Y4QI0aQOLaPeeH6DWDjUDaX3JmMIBJ6-4OPdW1VhgVuLS4X5cosSzg4XeIO4D_MIBX_GJmJXl8Kyvc9nIHPFzRi8qiiZ66t6NOj9TTE8g0THhFqIwl5doyoYxwvgvmV5GizNBEc265ZFJDAqNDBVlFwHLAKq75HwtzsKKm8RHaLMrCHCOsUy6kjgMh45DmsYATeEOkwRZapEaeoKYz1uRj2SBjsrLT6R_Xr9BOb_TQn_TvBvdnaNetjWNfhfwcbVazhblA2-qzms5nl36BvwEqWqdS |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Part-Aware+Unified+Representation+of+Language+and+Skeleton+for+Zero-Shot+Action+Recognition&rft.au=Zhu%2C+Anqi&rft.au=Ke%2C+Qiuhong&rft.au=Gong%2C+Mingming&rft.au=Bailey%2C+James&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=18761&rft.epage=18770&rft_id=info:doi/10.1109%2FCVPR52733.2024.01775&rft.externalDocID=10656505 |