Compact Latent Primitive Space Learning for Compositional Zero-Shot Learning

Compositional zero-shot learning (CZSL) aims to recognize novel compositions formed by known primitives (attribute and object). The key challenge of CZSL is the visual diversity of the primitive caused by the dependencies of attributes and objects. To solve this problem, most existing methods attemp...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on multimedia Ročník 27; s. 4297 - 4308
Hlavní autoři: Jiang, Han, Chen, Chaofan, Yang, Xiaoshan, Xu, Changsheng
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 2025
Témata:
ISSN:1520-9210, 1941-0077
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Compositional zero-shot learning (CZSL) aims to recognize novel compositions formed by known primitives (attribute and object). The key challenge of CZSL is the visual diversity of the primitive caused by the dependencies of attributes and objects. To solve this problem, most existing methods attempt to mine primitive-invariant features shared in all compositions or learn primitive-variant features specialized for each composition. However, these methods overlook that the primitives have inherent similarities and differences in different compositions, i.e., one primitive may exhibit a common visual appearance under some compositions, but have different expressions in other partial compositions. To sufficiently explore the partial similarity and visual diversity of primitives, we propose a compact latent primitive space learning framework, which explicitly leverages various codewords to encode the primitive features to make a balance between generality and diversity. Specifically, we borrow the idea from discriminative sparse coding to learn these representative codewords to build the latent primitive space. Through the sparse reconstruction loss, contrastive loss and orthogonal constraint, our model can adaptively reconstruct the primitive features according to the similarity weights between the primitive features and codewords. Comprehensive experiments on four benchmarks demonstrate that the proposed method achieves better performance than previous methods.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2025.3535315