Interpretation for Variational Autoencoder Used to Generate Financial Synthetic Tabular Data

Synthetic data, artificially generated by computer programs, has become more widely used in the financial domain to mitigate privacy concerns. Variational Autoencoder (VAE) is one of the most popular deep-learning models for generating synthetic data. However, VAE is often considered a “black box” d...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Algorithms Ročník 16; číslo 2; s. 121
Hlavní autoři:	Wu, Jinhong, Plataniotis, Konstantinos, Liu, Lucy, Amjadian, Ehsan, Lawryshyn, Yuri
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Basel MDPI AG 01.02.2023
Témata:	Analysis Banking industry Datasets Deep learning feature importance feature interaction financial synthetic tabular data interpretability Privacy Privacy, Right of Sensitivity analysis sensitivity-based method Software Statistical methods Tables (data) variational autoencoder Visualization Canada
ISSN:	1999-4893, 1999-4893
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Synthetic data, artificially generated by computer programs, has become more widely used in the financial domain to mitigate privacy concerns. Variational Autoencoder (VAE) is one of the most popular deep-learning models for generating synthetic data. However, VAE is often considered a “black box” due to its opaqueness. Although some studies have been conducted to provide explanatory insights into VAE, research focusing on explaining how the input data could influence VAE to create synthetic data, especially for tabular data, is still lacking. However, in the financial industry, most data are stored in a tabular format. This paper proposes a sensitivity-based method to assess the impact of inputted tabular data on how VAE synthesizes data. This sensitivity-based method can provide both global and local interpretations efficiently and intuitively. To test this method, a simulated dataset and three Kaggle banking tabular datasets were employed. The results confirmed the applicability of this proposed method.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1999-4893 1999-4893
DOI:	10.3390/a16020121