Spatial–Temporal Heatmap Masked Autoencoder for Skeleton-Based Action Recognition

Skeleton representation learning offers substantial advantages for action recognition by encoding intricate motion details and spatial–temporal dependencies among joints. However, fully supervised approaches necessitate large amounts of annotated data, which are often labor-intensive and costly to a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Sensors (Basel, Switzerland) Ročník 25; číslo 10; s. 3146
Hlavní autoři: Bian, Cunling, Yang, Yang, Wang, Tao, Lu, Weigang
Médium: Journal Article
Jazyk:angličtina
Vydáno: Switzerland MDPI AG 16.05.2025
MDPI
Témata:
ISSN:1424-8220, 1424-8220
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Skeleton representation learning offers substantial advantages for action recognition by encoding intricate motion details and spatial–temporal dependencies among joints. However, fully supervised approaches necessitate large amounts of annotated data, which are often labor-intensive and costly to acquire. In this work, we propose the Spatial–Temporal Heatmap Masked Autoencoder (STH-MAE), a novel self-supervised framework tailored for skeleton-based action recognition. Unlike coordinate-based methods, STH-MAE adopts heatmap volumes as its primary representation, mitigating noise inherent in pose estimation while capitalizing on advances in Vision Transformers. The framework constructs a spatial–temporal heatmap (STH) by aggregating 2D joint heatmaps across both spatial and temporal axes. This STH is partitioned into non-overlapping patches to facilitate local feature learning, with a masking strategy applied to randomly conceal portions of the input. During pre-training, a Vision Transformer-based autoencoder equipped with a lightweight prediction head reconstructs the masked regions, fostering the extraction of robust and transferable skeletal representations. Comprehensive experiments on the NTU RGB+D 60 and NTU RGB+D 120 benchmarks demonstrate the superiority of STH-MAE, achieving state-of-the-art performance under multiple evaluation protocols.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1424-8220
1424-8220
DOI:10.3390/s25103146