Image–text feature learning for unsupervised visible–infrared person re-identification
Visible–infrared person re-identification (VI-ReID) focuses on matching infrared and visible images of the same person. To reduce labeling costs, unsupervised VI-ReID (UVI-ReID) methods typically use clustering algorithms to generate pseudo-labels and iteratively optimize the model based on these ps...
Uloženo v:
| Vydáno v: | Image and vision computing Ročník 158; s. 105520 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.05.2025
|
| Témata: | |
| ISSN: | 0262-8856 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Visible–infrared person re-identification (VI-ReID) focuses on matching infrared and visible images of the same person. To reduce labeling costs, unsupervised VI-ReID (UVI-ReID) methods typically use clustering algorithms to generate pseudo-labels and iteratively optimize the model based on these pseudo-labels. Although existing UVI-ReID methods have achieved promising performance, they often overlook the effectiveness of text semantics in inter-modality matching and modality-invariant feature learning. In this paper, we propose an image–text feature learning (ITFL) method, which not only leverages text semantics to enhance intra-modality identity-related learning but also incorporates text semantics into inter-modality matching and modality-invariant feature learning. Specifically, ITFL first performs modality-aware feature learning to generate pseudo-labels within each modality. Then, ITFL employs modality-invariant text modeling (MTM) to learn a text feature for each cluster in the visible modality, and utilizes inter-modality dual-semantics matching (IDM) to match inter-modality positive clusters. To obtain modality-invariant and identity-related image features, we not only introduce a cross-modality contrastive loss in ITFL to mitigate the impact of modality gaps, but also develop a text semantic consistency loss to further promote modality-invariant feature learning. Extensive experimental results on VI-ReID datasets demonstrate that ITFL not only outperforms existing unsupervised methods but also competes with some supervised approaches.
•We introduce text semantics into both inter-modality matching and learning.•We match inter-modality positive clusters based on dual semantics.•Text semantic consistency loss is introduced for modality-invariant learning. |
|---|---|
| ISSN: | 0262-8856 |
| DOI: | 10.1016/j.imavis.2025.105520 |