Image–text feature learning for unsupervised visible–infrared person re-identification

Visible–infrared person re-identification (VI-ReID) focuses on matching infrared and visible images of the same person. To reduce labeling costs, unsupervised VI-ReID (UVI-ReID) methods typically use clustering algorithms to generate pseudo-labels and iteratively optimize the model based on these ps...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Image and vision computing Ročník 158; s. 105520
Hlavní autoři:	Guo, Jifeng, Pang, Zhiqi
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.05.2025
Témata:	Contrastive learning Unsupervised learning Visible–infrared person re-identification Vision-language models Vision-language models Visible–infrared person re-identification Unsupervised learning Contrastive learning
ISSN:	0262-8856
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Visible–infrared person re-identification (VI-ReID) focuses on matching infrared and visible images of the same person. To reduce labeling costs, unsupervised VI-ReID (UVI-ReID) methods typically use clustering algorithms to generate pseudo-labels and iteratively optimize the model based on these pseudo-labels. Although existing UVI-ReID methods have achieved promising performance, they often overlook the effectiveness of text semantics in inter-modality matching and modality-invariant feature learning. In this paper, we propose an image–text feature learning (ITFL) method, which not only leverages text semantics to enhance intra-modality identity-related learning but also incorporates text semantics into inter-modality matching and modality-invariant feature learning. Specifically, ITFL first performs modality-aware feature learning to generate pseudo-labels within each modality. Then, ITFL employs modality-invariant text modeling (MTM) to learn a text feature for each cluster in the visible modality, and utilizes inter-modality dual-semantics matching (IDM) to match inter-modality positive clusters. To obtain modality-invariant and identity-related image features, we not only introduce a cross-modality contrastive loss in ITFL to mitigate the impact of modality gaps, but also develop a text semantic consistency loss to further promote modality-invariant feature learning. Extensive experimental results on VI-ReID datasets demonstrate that ITFL not only outperforms existing unsupervised methods but also competes with some supervised approaches. •We introduce text semantics into both inter-modality matching and learning.•We match inter-modality positive clusters based on dual semantics.•Text semantic consistency loss is introduced for modality-invariant learning.
ISSN:	0262-8856
DOI:	10.1016/j.imavis.2025.105520