Multimodal nonparametric clustering via Monte Carlo method and fusion embedding generated by variational autoencoder
Multimodal Cluster analysis is a fundamental task in Generative AI (GenAI). However, its application in practical domains, such as multimodal perception for autonomous driving and industrial automation, faces two significant challenges: (1) Determining the number of clusters, k, is challenging, and...
Saved in:
| Published in: | Information fusion Vol. 126; p. 103612 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.02.2026
|
| Subjects: | |
| ISSN: | 1566-2535 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Multimodal Cluster analysis is a fundamental task in Generative AI (GenAI). However, its application in practical domains, such as multimodal perception for autonomous driving and industrial automation, faces two significant challenges: (1) Determining the number of clusters, k, is challenging, and incorrect k values can considerably degrade the performance of parametric clustering algorithms; (2) The absence of inter-cluster information learning during the clustering process hinders nonparametric clustering algorithms from effectively learning cross-modal shared representations. To address these challenges, we propose a nonparametric deep clustering method, named the multimodal nonparametric clustering algorithm via Monte Carlo method and variational autoencoder (MMCVA). MMCVA is a nonparametric learning framework that integrates a trimodal variational autoencoder and a predictor to effectively learn shared representations across modalities. Within this framework, we also introduce a novel evaluation metric, named Gaussian mixture model clustering overlap. Unlike existing nonparametric methods that rely on a single metric or determine the number of clusters through complex calculations, our approach comprehensively learns both inter-cluster and intra-cluster information, significantly improving the accuracy of k prediction. Extensive experiments conducted on three unimodal and three multimodal public datasets demonstrate that MMCVA enhances the accuracy of k prediction by an average of 13.33% compared to existing unimodal/multimodal nonparametric methods. Additionally, in an “unfair” comparison with parametric methods, MMCVA achieved competitive clustering performance, with the highest improvement in clustering accuracy reaching 2.58%.
•We propose a new nonparametric algorithm for multimodal clustering.•We propose a new effectiveness metric for finding the number of clusters.•MMCVA improves the average accuracy by 13.33% compared with the baseline. |
|---|---|
| ISSN: | 1566-2535 |
| DOI: | 10.1016/j.inffus.2025.103612 |