Multimodal nonparametric clustering via Monte Carlo method and fusion embedding generated by variational autoencoder

Multimodal Cluster analysis is a fundamental task in Generative AI (GenAI). However, its application in practical domains, such as multimodal perception for autonomous driving and industrial automation, faces two significant challenges: (1) Determining the number of clusters, k, is challenging, and...

Full description

Saved in:
Bibliographic Details
Published in:Information fusion Vol. 126; p. 103612
Main Authors: Ma, Yuanchi, He, Hui, Zhang, Gang, Niu, Zhendong
Format: Journal Article
Language:English
Published: Elsevier B.V 01.02.2026
Subjects:
ISSN:1566-2535
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multimodal Cluster analysis is a fundamental task in Generative AI (GenAI). However, its application in practical domains, such as multimodal perception for autonomous driving and industrial automation, faces two significant challenges: (1) Determining the number of clusters, k, is challenging, and incorrect k values can considerably degrade the performance of parametric clustering algorithms; (2) The absence of inter-cluster information learning during the clustering process hinders nonparametric clustering algorithms from effectively learning cross-modal shared representations. To address these challenges, we propose a nonparametric deep clustering method, named the multimodal nonparametric clustering algorithm via Monte Carlo method and variational autoencoder (MMCVA). MMCVA is a nonparametric learning framework that integrates a trimodal variational autoencoder and a predictor to effectively learn shared representations across modalities. Within this framework, we also introduce a novel evaluation metric, named Gaussian mixture model clustering overlap. Unlike existing nonparametric methods that rely on a single metric or determine the number of clusters through complex calculations, our approach comprehensively learns both inter-cluster and intra-cluster information, significantly improving the accuracy of k prediction. Extensive experiments conducted on three unimodal and three multimodal public datasets demonstrate that MMCVA enhances the accuracy of k prediction by an average of 13.33% compared to existing unimodal/multimodal nonparametric methods. Additionally, in an “unfair” comparison with parametric methods, MMCVA achieved competitive clustering performance, with the highest improvement in clustering accuracy reaching 2.58%. •We propose a new nonparametric algorithm for multimodal clustering.•We propose a new effectiveness metric for finding the number of clusters.•MMCVA improves the average accuracy by 13.33% compared with the baseline.
ISSN:1566-2535
DOI:10.1016/j.inffus.2025.103612