GAN-MVAE: A discriminative latent feature generation framework for generalized zero-shot learning

•Propose a deep generative model (called GAN-MVAE) for Generalized Zero-Shot Learning.•Align real and generated feature distributions in the latent space of MVAE.•Propose a novel MVAE to preserve multi-modal information of the class in the latent space.•Provide some inspiration for the study of mult...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition letters Jg. 155; S. 77 - 83
Hauptverfasser:	Ma, Peirong, Lu, Hong, Yang, Bohong, Ran, Wu
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Amsterdam Elsevier B.V 01.03.2022 Elsevier Science Ltd
Schlagworte:	Cross-modal reconstruction Distribution alignment Embedding Generalized Zero-Shot Learning (GZSL) Generative adversarial Network (GAN) Generative adversarial networks Reconstruction Semantics Training Variational Autoencoder (VAE) Zero-shot learning Cross-modal reconstruction Generative adversarial Network (GAN) Variational Autoencoder (VAE) Distribution alignment Generalized Zero-Shot Learning (GZSL)
ISSN:	0167-8655, 1872-7344
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Propose a deep generative model (called GAN-MVAE) for Generalized Zero-Shot Learning.•Align real and generated feature distributions in the latent space of MVAE.•Propose a novel MVAE to preserve multi-modal information of the class in the latent space.•Provide some inspiration for the study of multi-modal alignment and asymmetric VAE.•Extensive experimental results show that GAN-MVAE significantly outperforms the state-of-the-art. Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize both seen and unseen classes. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Recently, Generative adversarial network (GAN) have gained considerable attention in GZSL. GAN can generate missing unseen classes samples from class-specific semantic embedding for training, thereby transforming GZSL into a traditional classification task and achieving impressive results. However, due to the instability during training and the complexity of data distribution, a simple GAN framework cannot capture the real data distribution perfectly, and there is still a large gap between the generated and real sample distributions, which severely limits the performance of GZSL. Therefore, the proposed GAN-MVAE further aligns the real and generated samples by mapping them into the latent space of multi-modal reconstruction variational autoencoder (MVAE), while preserving discriminative semantic information through cross-modal reconstruction. GAN-MVAE provides some inspiration for the study of multi-modal alignment and asymmetry VAE. Extensive experiments on four GZSL benchmark datasets show that GAN-MVAE significantly outperforms the state of the arts.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2022.02.002