Multimodal nonparametric clustering via Monte Carlo method and fusion embedding generated by variational autoencoder

Multimodal Cluster analysis is a fundamental task in Generative AI (GenAI). However, its application in practical domains, such as multimodal perception for autonomous driving and industrial automation, faces two significant challenges: (1) Determining the number of clusters, k, is challenging, and...

Full description

Saved in:

Bibliographic Details
Published in:	Information fusion Vol. 126; p. 103612
Main Authors:	Ma, Yuanchi, He, Hui, Zhang, Gang, Niu, Zhendong
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 01.02.2026
Subjects:	Monte Carlo method Multimodal variational autoencoder Nonparametric clustering Monte Carlo method Nonparametric clustering Multimodal variational autoencoder
ISSN:	1566-2535
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multimodal Cluster analysis is a fundamental task in Generative AI (GenAI). However, its application in practical domains, such as multimodal perception for autonomous driving and industrial automation, faces two significant challenges: (1) Determining the number of clusters, k, is challenging, and incorrect k values can considerably degrade the performance of parametric clustering algorithms; (2) The absence of inter-cluster information learning during the clustering process hinders nonparametric clustering algorithms from effectively learning cross-modal shared representations. To address these challenges, we propose a nonparametric deep clustering method, named the multimodal nonparametric clustering algorithm via Monte Carlo method and variational autoencoder (MMCVA). MMCVA is a nonparametric learning framework that integrates a trimodal variational autoencoder and a predictor to effectively learn shared representations across modalities. Within this framework, we also introduce a novel evaluation metric, named Gaussian mixture model clustering overlap. Unlike existing nonparametric methods that rely on a single metric or determine the number of clusters through complex calculations, our approach comprehensively learns both inter-cluster and intra-cluster information, significantly improving the accuracy of k prediction. Extensive experiments conducted on three unimodal and three multimodal public datasets demonstrate that MMCVA enhances the accuracy of k prediction by an average of 13.33% compared to existing unimodal/multimodal nonparametric methods. Additionally, in an “unfair” comparison with parametric methods, MMCVA achieved competitive clustering performance, with the highest improvement in clustering accuracy reaching 2.58%. •We propose a new nonparametric algorithm for multimodal clustering.•We propose a new effectiveness metric for finding the number of clusters.•MMCVA improves the average accuracy by 13.33% compared with the baseline.
ISSN:	1566-2535
DOI:	10.1016/j.inffus.2025.103612