Compact Latent Primitive Space Learning for Compositional Zero-Shot Learning

Compositional zero-shot learning (CZSL) aims to recognize novel compositions formed by known primitives (attribute and object). The key challenge of CZSL is the visual diversity of the primitive caused by the dependencies of attributes and objects. To solve this problem, most existing methods attemp...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia Vol. 27; pp. 4297 - 4308
Main Authors:	Jiang, Han, Chen, Chaofan, Yang, Xiaoshan, Xu, Changsheng
Format:	Journal Article
Language:	English
Published:	IEEE 2025
Subjects:	Compositional zero-shot learning (CZSL) Encoding feature coding Feature extraction Image coding Image reconstruction Iris recognition Learning systems Semantics Training Visualization Zero shot learning
ISSN:	1520-9210, 1941-0077
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Compositional zero-shot learning (CZSL) aims to recognize novel compositions formed by known primitives (attribute and object). The key challenge of CZSL is the visual diversity of the primitive caused by the dependencies of attributes and objects. To solve this problem, most existing methods attempt to mine primitive-invariant features shared in all compositions or learn primitive-variant features specialized for each composition. However, these methods overlook that the primitives have inherent similarities and differences in different compositions, i.e., one primitive may exhibit a common visual appearance under some compositions, but have different expressions in other partial compositions. To sufficiently explore the partial similarity and visual diversity of primitives, we propose a compact latent primitive space learning framework, which explicitly leverages various codewords to encode the primitive features to make a balance between generality and diversity. Specifically, we borrow the idea from discriminative sparse coding to learn these representative codewords to build the latent primitive space. Through the sparse reconstruction loss, contrastive loss and orthogonal constraint, our model can adaptively reconstruct the primitive features according to the similarity weights between the primitive features and codewords. Comprehensive experiments on four benchmarks demonstrate that the proposed method achieves better performance than previous methods.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2025.3535315