Part-Aware Unified Representation of Language and Skeleton for Zero-Shot Action Recognition

While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this pa-per, we argue that relying solely on aligning label-level se-mantics and global skeleton features is insufficient to effectively t...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 18761 - 18770
Hlavní autoři: Zhu, Anqi, Ke, Qiuhong, Gong, Mingming, Bailey, James
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 16.06.2024
Témata:
ISSN:1063-6919
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this pa-per, we argue that relying solely on aligning label-level se-mantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we intro-duce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to gen-erate aligned textual and visual representations across dif-ferent levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skele-ton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzhl/PURLS.
AbstractList While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this pa-per, we argue that relying solely on aligning label-level se-mantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we intro-duce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to gen-erate aligned textual and visual representations across dif-ferent levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skele-ton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzhl/PURLS.
Author Zhu, Anqi
Gong, Mingming
Ke, Qiuhong
Bailey, James
Author_xml – sequence: 1
  givenname: Anqi
  surname: Zhu
  fullname: Zhu, Anqi
  email: azzh1@student.unimelb.edu.au
  organization: The University of Melbourne,Parkville,VIC,Australia,3052
– sequence: 2
  givenname: Qiuhong
  surname: Ke
  fullname: Ke, Qiuhong
  email: qiuhong.ke@monash.edu
  organization: Monash University,Clayton,VIC,Australia,3800
– sequence: 3
  givenname: Mingming
  surname: Gong
  fullname: Gong, Mingming
  email: mingming.gong@unimelb.edu.au
  organization: The University of Melbourne,Parkville,VIC,Australia,3052
– sequence: 4
  givenname: James
  surname: Bailey
  fullname: Bailey, James
  email: baileyj@unimelb.edu.au
  organization: The University of Melbourne,Parkville,VIC,Australia,3052
BookMark eNotj91KAzEUhKMoWGvfoBd5ga0nm5xkc1mKf1CwtNYLvShpcrZGa1J2V8S3d_25moEZhvnO2UnKiRgbC5gIAfZy9rhYYmmknJRQqgkIY_CIjayxlUSQKAH0MRsI0LLQVtgzNmrbVwCQpRDaVgP2vHBNV0w_XUN8nWIdKfAlHRpqKXWuiznxXPO5S7sPtyPuUuCrN9pT1wd1bvgTNblYveSOT_1ve0k-71L88RfstHb7lkb_OmTr66uH2W0xv7-5m03nRRRGd4Vy0ikdaqw8Vf014QP50iskZYLWtXboAlgdhEe0gFstrTJoUWx7hqDkkI3_diMRbQ5NfHfN16aHRo2A8hsro1X6
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52733.2024.01775
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350353006
EISSN 1063-6919
EndPage 18770
ExternalDocumentID 10656505
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i176t-4a3a46df58ce80031cdec2c45e47d66f6a5ad096d1c55905b639475951b169d43
IEDL.DBID RIE
ISICitedReferencesCount 14
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001342515502010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:00:57 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-4a3a46df58ce80031cdec2c45e47d66f6a5ad096d1c55905b639475951b169d43
PageCount 10
ParticipantIDs ieee_primary_10656505
PublicationCentury 2000
PublicationDate 2024-June-16
PublicationDateYYYYMMDD 2024-06-16
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-16
  day: 16
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.3779042
Snippet While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored....
SourceID ieee
SourceType Publisher
StartPage 18761
SubjectTerms action recognition
Computer vision
contrastive learning
large language model
Large language models
Natural languages
Pattern recognition
representation learning
Skeleton
skeleton action recognition
Source coding
Visualization
zero-shot learning
Title Part-Aware Unified Representation of Language and Skeleton for Zero-Shot Action Recognition
URI https://ieeexplore.ieee.org/document/10656505
WOSCitedRecordID wos001342515502010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWg4sCJrYhdPnB1Seol9rGqqDigKiqLKjhUjhdRISUoTeH3Gbtp4cKBWxYlkcYezZvJezMIXVvBXKq9J1IVhjDJPVGMJ8RYlxmvtbaaxWET2Xgsp1OVt2L1qIVxzkXymeuFw_gv31ZmGUpl4OGAPnjoWLqdZWIl1toUVCikMkLJVh6XJupm-JxPQn8xCmlgn_Vg7wU24a8hKjGGjPb--fV91P1R4-F8E2cO0JYrD9FeCx9x65yLI_SawzYggy9dOwxQ0oe7k0h0bfVFJa48vm8LlFiX8PA7RB1AfxigK35xdUUe3qoGD6LYAU_W5KKq7KKn0e3j8I60sxPIPM1EQ5immgnruTROBs8F45u-YdyxzArhhebaQvpiUwM5RcILQCqh9R9PC7CiZfQYdcqqdCcIWyWktjSRmqasoBJO4A19C7awUjl9irrBWLOPVXuM2dpOZ39cP0e7YT0C3yoVF6jT1Et3iXbMZzNf1FdxUb8BcmakCw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF5EBT3VR8W3e_C6NY_dzeZYiqViLaFWKXoom31gERJpU_37zm7T6sWDtzxIArM7zDeT75tB6FpzakJpLRFprggVzJKUsoAobRJlpZRaUj9sIhkMxHicZrVY3WthjDGefGZa7tD_y9elWrhSGXg4oA_mOpZuMUqjYCnXWpdUYkhmeCpqgVwYpDed52zoOozFkAhGtAW7z_EJf41R8VGk2_jn9_dQ80ePh7N1pNlHG6Y4QI0aQOLaPeeH6DWDjUDaX3JmMIBJ6-4OPdW1VhgVuLS4X5cosSzg4XeIO4D_MIBX_GJmJXl8Kyvc9nIHPFzRi8qiiZ66t6NOj9TTE8g0THhFqIwl5doyoYxwvgvmV5GizNBEc265ZFJDAqNDBVlFwHLAKq75HwtzsKKm8RHaLMrCHCOsUy6kjgMh45DmsYATeEOkwRZapEaeoKYz1uRj2SBjsrLT6R_Xr9BOb_TQn_TvBvdnaNetjWNfhfwcbVazhblA2-qzms5nl36BvwEqWqdS
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Part-Aware+Unified+Representation+of+Language+and+Skeleton+for+Zero-Shot+Action+Recognition&rft.au=Zhu%2C+Anqi&rft.au=Ke%2C+Qiuhong&rft.au=Gong%2C+Mingming&rft.au=Bailey%2C+James&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=18761&rft.epage=18770&rft_id=info:doi/10.1109%2FCVPR52733.2024.01775&rft.externalDocID=10656505