CTVAE: Contrastive Tabular Variational Autoencoder for imbalance data CTVAE: Contrastive Tabular Variational Autoencoder for imbalance data

Class imbalance, where datasets often lack sufficient samples for minority classes, is a persistent challenge in machine learning. Existing solutions often generate synthetic data to mitigate this issue, but they typically struggle with complex data distributions, primarily because they focus on ove...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge and information systems Jg. 67; H. 6; S. 5335 - 5354
Hauptverfasser:	Wang, Alex X., Le, Minh Quang, Duong, Huu-Thanh, Van, Bay Nguyen, Nguyen, Binh P.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	London Springer London 01.06.2025 Springer Nature B.V
Schlagworte:	Accuracy Classification Computer Science Data Mining and Knowledge Discovery Database Management Datasets Deep learning Diffusion models Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Machine learning Oversampling Synthetic data Deep learning Imbalance data Data-centric AI Contrastive learning Synthetic data
ISSN:	0219-1377, 0219-3116
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Class imbalance, where datasets often lack sufficient samples for minority classes, is a persistent challenge in machine learning. Existing solutions often generate synthetic data to mitigate this issue, but they typically struggle with complex data distributions, primarily because they focus on oversampling the minority class while neglecting the relationships with the majority class. To overcome these limitations, we propose the Contrastive Tabular Variational Autoencoder (CTVAE), which integrates conditional Variational Autoencoders with contrastive learning techniques. CTVAE excels at generating high-quality synthetic samples that capture the intricate data distributions of both minority and majority classes. Additionally, it can be seamlessly integrated with variants of the Synthetic Minority Oversampling Technique (SMOTE) for enhanced effectiveness. Experimental results demonstrate that CTVAE substantially improves classification performance on imbalanced datasets, offering a more robust and holistic solution to the class imbalance problem.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-025-02377-7