Evaluating Variational Autoencoder as a Private Data Release Mechanism for Tabular Data

Multi-market businesses can collect data from different business entities and aggregate data from various sources to create value. However, due to the restriction of privacy regulation, it could be illegal to exchange data between business entities of the same parent company, unless the users have o...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE Pacific Rim International Symposium on Dependable Computing) pp. 198 - 1988
Main Authors: Li, Szu-Chuang, Tai, Bo-Chen, Huang, Yennun
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2019
Subjects:
ISSN:2473-3105
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multi-market businesses can collect data from different business entities and aggregate data from various sources to create value. However, due to the restriction of privacy regulation, it could be illegal to exchange data between business entities of the same parent company, unless the users have opted-in to allow it. Regulations such as the EU's GDPR allows data exchange if data is anonymized appropriately. In this study, we use variational autoencoder as a mechanism to generate synthetic data. The privacy and utility of the generated data sets are measured. And its performance is compared with the performance of the plain autoencoder. The primary findings of this study are 1) variational autoencoder can be an option for data exchange with good accuracy even when the number of latent dimensions is low 2) plain autoencoder still provides better accuracy when the number of hidden nodes is high 3) variational autoencoder, as a generative model, can be given to a data user to generate his version of data that closely mimic the original data set.
ISSN:2473-3105
DOI:10.1109/PRDC47002.2019.00050