Emergent communication of multimodal deep generative models based on Metropolis-Hastings naming game

Deep generative models (DGM) are increasingly employed in emergent communication systems. However, their application in multimodal data contexts is limited. This study proposes a novel model that combines multimodal DGM with the Metropolis-Hastings (MH) naming game, enabling two agents to focus join...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Frontiers in robotics and AI Ročník 10; s. 1290604
Hlavní autori:	Hoang, Nguyen Le, Taniguchi, Tadahiro, Hagiwara, Yoshinobu, Taniguchi, Akira
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Switzerland Frontiers Media S.A 31.01.2024
Predmet:	deep generative model emergent communication Metropolis-Hastings multimodal Robotics and AI symbol emergence variational autoencoder naming game Metropolis-Hastings symbol emergence multimodal variational autoencoder deep generative model emergent communication
ISSN:	2296-9144, 2296-9144
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Deep generative models (DGM) are increasingly employed in emergent communication systems. However, their application in multimodal data contexts is limited. This study proposes a novel model that combines multimodal DGM with the Metropolis-Hastings (MH) naming game, enabling two agents to focus jointly on a shared subject and develop common vocabularies. The model proves that it can handle multimodal data, even in cases of missing modalities. Integrating the MH naming game with multimodal variational autoencoders (VAE) allows agents to form perceptual categories and exchange signs within multimodal contexts. Moreover, fine-tuning the weight ratio to favor a modality that the model could learn and categorize more readily improved communication. Our evaluation of three multimodal approaches - mixture-of-experts (MoE), product-of-experts (PoE), and mixture-of-product-of-experts (MoPoE)–suggests an impact on the creation of latent spaces, the internal representations of agents. Our results from experiments with the MNIST + SVHN and Multimodal165 datasets indicate that combining the Gaussian mixture model (GMM), PoE multimodal VAE, and MH naming game substantially improved information sharing, knowledge formation, and data reconstruction.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Roberto Dessì, Pompeu Fabra University, Spain These authors have contributed equally to this work Edited by: Jakob Foerster, University of Oxford, United Kingdom Reviewed by: Angelos Filos, DeepMind Technologies Limited, United Kingdom
ISSN:	2296-9144 2296-9144
DOI:	10.3389/frobt.2023.1290604