Pedestrian Re-Recognition Based on Spatiotemporal Transformer Skeleton Contrastive Learning and Feature Optimization
Person re-identification is an important task in computer vision, aimed at achieving cross-camera identity confirmation by identifying and matching the same pedestrian under different cameras. However, when traditional image-based methods are affected by factors such as lighting changes, occlusion,...
Saved in:
| Published in: | Journal of advanced computational intelligence and intelligent informatics Vol. 29; no. 6; pp. 1249 - 1261 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Tokyo
Fuji Technology Press Co. Ltd
20.11.2025
|
| Subjects: | |
| ISSN: | 1343-0130, 1883-8014 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Person re-identification is an important task in computer vision, aimed at achieving cross-camera identity confirmation by identifying and matching the same pedestrian under different cameras. However, when traditional image-based methods are affected by factors such as lighting changes, occlusion, and changes in viewing angles, the advantages of skeleton data become increasingly apparent. Existing methods typically use primitive body joint design skeleton descriptors or learn skeleton sequence representations, but they often cannot simultaneously simulate the relationships between different body components, and rarely model skeleton information from both temporal and spatial dimensions. Therefore, in this paper, we propose a universal skeleton contrastive learning method based on the spatiotemporal Transformer (Space-time Transformer, StFormer). The method first adopts the Space-time Attention (S-T Attention) mechanism and achieves relationship modeling of spatiotemporal features by stacking multiple S-T Attention blocks. Secondly, to improve the important clues for extracting data features from the model, a Feature Refinement Box (FR Box) was proposed. Finally, we purpose a unique prompt learning mechanism (P-Study) which utilizes the spatiotemporal context of graph nodes to prompt skeleton graph reconstruction and help capture more valuable patterns and graph semantics. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1343-0130 1883-8014 |
| DOI: | 10.20965/jaciii.2025.p1249 |