Bayesian mixture variational autoencoders for multi-modal learning

This paper provides an in-depth analysis on how to effectively acquire and generalize cross-modal knowledge for multi-modal learning. Mixture-of-Expert (MoE) and Product-of-Expert (PoE) are two popular directions in generalizing multi-modal information. Existing works based on MoE or PoE have shown...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine learning Jg. 111; H. 12; S. 4329 - 4357
Hauptverfasser: Liao, Keng-Te, Huang, Bo-Wei, Yang, Chih-Chun, Lin, Shou-De
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.12.2022
Springer Nature B.V
Schlagworte:
ISSN:0885-6125, 1573-0565
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper provides an in-depth analysis on how to effectively acquire and generalize cross-modal knowledge for multi-modal learning. Mixture-of-Expert (MoE) and Product-of-Expert (PoE) are two popular directions in generalizing multi-modal information. Existing works based on MoE or PoE have shown notable improvement on data generation, while new challenges such as high training cost, overconfident experts, and encoding modal-specific features also emerge. In this work, we propose Bayesian mixture variational autoencoder (BMVAE) which learns to select or combine experts via Bayesian inference. We show that the proposed idea can naturally encourage models to learn modal-specific knowledge and avoid overconfident experts. Also, we show that the idea is compatible with both MoE and PoE frameworks. When being a MoE model, BMVAE can be optimized by a tight lower bound and is efficient to train. The PoE BMVAE has the same advantages and a theoretical connection to existing works. In the experiments, we show that BMVAE achieves state-of-the-art performance.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-6125
1573-0565
DOI:10.1007/s10994-022-06272-y