Image–text feature learning for unsupervised visible–infrared person re-identification

Visible–infrared person re-identification (VI-ReID) focuses on matching infrared and visible images of the same person. To reduce labeling costs, unsupervised VI-ReID (UVI-ReID) methods typically use clustering algorithms to generate pseudo-labels and iteratively optimize the model based on these ps...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Image and vision computing Ročník 158; s. 105520
Hlavní autoři: Guo, Jifeng, Pang, Zhiqi
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.05.2025
Témata:
ISSN:0262-8856
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Visible–infrared person re-identification (VI-ReID) focuses on matching infrared and visible images of the same person. To reduce labeling costs, unsupervised VI-ReID (UVI-ReID) methods typically use clustering algorithms to generate pseudo-labels and iteratively optimize the model based on these pseudo-labels. Although existing UVI-ReID methods have achieved promising performance, they often overlook the effectiveness of text semantics in inter-modality matching and modality-invariant feature learning. In this paper, we propose an image–text feature learning (ITFL) method, which not only leverages text semantics to enhance intra-modality identity-related learning but also incorporates text semantics into inter-modality matching and modality-invariant feature learning. Specifically, ITFL first performs modality-aware feature learning to generate pseudo-labels within each modality. Then, ITFL employs modality-invariant text modeling (MTM) to learn a text feature for each cluster in the visible modality, and utilizes inter-modality dual-semantics matching (IDM) to match inter-modality positive clusters. To obtain modality-invariant and identity-related image features, we not only introduce a cross-modality contrastive loss in ITFL to mitigate the impact of modality gaps, but also develop a text semantic consistency loss to further promote modality-invariant feature learning. Extensive experimental results on VI-ReID datasets demonstrate that ITFL not only outperforms existing unsupervised methods but also competes with some supervised approaches. •We introduce text semantics into both inter-modality matching and learning.•We match inter-modality positive clusters based on dual semantics.•Text semantic consistency loss is introduced for modality-invariant learning.
ISSN:0262-8856
DOI:10.1016/j.imavis.2025.105520