Pedestrian Re-Recognition Based on Spatiotemporal Transformer Skeleton Contrastive Learning and Feature Optimization

Person re-identification is an important task in computer vision, aimed at achieving cross-camera identity confirmation by identifying and matching the same pedestrian under different cameras. However, when traditional image-based methods are affected by factors such as lighting changes, occlusion,...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of advanced computational intelligence and intelligent informatics Ročník 29; číslo 6; s. 1249 - 1261
Hlavní autoři:	Jia, Yanru, Zhang, Yuanyuan, Gao, Yilun
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Tokyo Fuji Technology Press Co. Ltd 20.11.2025
Témata:	Cameras Computer vision Image reconstruction Learning Occlusion Semantics Spacetime
ISSN:	1343-0130, 1883-8014
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Person re-identification is an important task in computer vision, aimed at achieving cross-camera identity confirmation by identifying and matching the same pedestrian under different cameras. However, when traditional image-based methods are affected by factors such as lighting changes, occlusion, and changes in viewing angles, the advantages of skeleton data become increasingly apparent. Existing methods typically use primitive body joint design skeleton descriptors or learn skeleton sequence representations, but they often cannot simultaneously simulate the relationships between different body components, and rarely model skeleton information from both temporal and spatial dimensions. Therefore, in this paper, we propose a universal skeleton contrastive learning method based on the spatiotemporal Transformer (Space-time Transformer, StFormer). The method first adopts the Space-time Attention (S-T Attention) mechanism and achieves relationship modeling of spatiotemporal features by stacking multiple S-T Attention blocks. Secondly, to improve the important clues for extracting data features from the model, a Feature Refinement Box (FR Box) was proposed. Finally, we purpose a unique prompt learning mechanism (P-Study) which utilizes the spatiotemporal context of graph nodes to prompt skeleton graph reconstruction and help capture more valuable patterns and graph semantics.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1343-0130 1883-8014
DOI:	10.20965/jaciii.2025.p1249