Compact Latent Primitive Space Learning for Compositional Zero-Shot Learning
Compositional zero-shot learning (CZSL) aims to recognize novel compositions formed by known primitives (attribute and object). The key challenge of CZSL is the visual diversity of the primitive caused by the dependencies of attributes and objects. To solve this problem, most existing methods attemp...
Saved in:
| Published in: | IEEE transactions on multimedia Vol. 27; pp. 4297 - 4308 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
IEEE
2025
|
| Subjects: | |
| ISSN: | 1520-9210, 1941-0077 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Compositional zero-shot learning (CZSL) aims to recognize novel compositions formed by known primitives (attribute and object). The key challenge of CZSL is the visual diversity of the primitive caused by the dependencies of attributes and objects. To solve this problem, most existing methods attempt to mine primitive-invariant features shared in all compositions or learn primitive-variant features specialized for each composition. However, these methods overlook that the primitives have inherent similarities and differences in different compositions, i.e., one primitive may exhibit a common visual appearance under some compositions, but have different expressions in other partial compositions. To sufficiently explore the partial similarity and visual diversity of primitives, we propose a compact latent primitive space learning framework, which explicitly leverages various codewords to encode the primitive features to make a balance between generality and diversity. Specifically, we borrow the idea from discriminative sparse coding to learn these representative codewords to build the latent primitive space. Through the sparse reconstruction loss, contrastive loss and orthogonal constraint, our model can adaptively reconstruct the primitive features according to the similarity weights between the primitive features and codewords. Comprehensive experiments on four benchmarks demonstrate that the proposed method achieves better performance than previous methods. |
|---|---|
| ISSN: | 1520-9210 1941-0077 |
| DOI: | 10.1109/TMM.2025.3535315 |