Deterministic Autoencoder using Wasserstein loss for tabular data generation

Tabular data generation is a complex task due to its distinctive characteristics and inherent complexities. While Variational Autoencoders have been adapted from the computer vision domain for tabular data synthesis, their reliance on non-deterministic latent space regularization introduces limitati...

Full description

Saved in:

Bibliographic Details
Published in:	Neural networks Vol. 185; p. 107208
Main Authors:	Wang, Alex X., Nguyen, Binh P.
Format:	Journal Article
Language:	English
Published:	United States Elsevier Ltd 01.05.2025
Subjects:	Algorithms Autoencoder Deep Learning Deep neural networks Generative AI Humans Latent space interpolation Neural Networks, Computer Tabular data synthesis Wasserstein Autoencoder Deep neural networks Generative AI Tabular data synthesis Latent space interpolation Wasserstein Autoencoder
ISSN:	0893-6080, 1879-2782, 1879-2782
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Tabular data generation is a complex task due to its distinctive characteristics and inherent complexities. While Variational Autoencoders have been adapted from the computer vision domain for tabular data synthesis, their reliance on non-deterministic latent space regularization introduces limitations. The stochastic nature of Variational Autoencoders can contribute to collapsed posteriors, yielding suboptimal outcomes and limiting control over the latent space. This characteristic also constrains the exploration of latent space interpolation. To address these challenges, we present the Tabular Wasserstein Autoencoder (TWAE), leveraging the deterministic encoding mechanism of Wasserstein Autoencoders. This characteristic facilitates a deterministic mapping of inputs to latent codes, enhancing the stability and expressiveness of our model’s latent space. This, in turn, enables seamless integration with shallow interpolation mechanisms like the synthetic minority over-sampling technique (SMOTE) within the data generation process via deep learning. Specifically, TWAE is trained once to establish a low-dimensional representation of real data, and various latent interpolation methods efficiently generate synthetic latent points, achieving a balance between accuracy and efficiency. Extensive experiments consistently demonstrate TWAE’s superiority, showcasing its versatility across diverse feature types and dataset sizes. This innovative approach, combining WAE principles with shallow interpolation, effectively leverages SMOTE’s advantages, establishing TWAE as a robust solution for complex tabular data synthesis.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0893-6080 1879-2782 1879-2782
DOI:	10.1016/j.neunet.2025.107208