Privacy and distribution preserving generative adversarial networks with sample balancing

Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no gua...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 258; s. 125181
Hlavní autoři: Sun, Haoran, Tang, Jinchuan, Dang, Shuping, Chen, Gaojie
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 15.12.2024
Témata:
ISSN:0957-4174
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no guarantee for perfect protection on the data point. Due to noisy gradients, the training can converge to a suboptimum, or offer no protection when encountering a noise equilibrium. To address the above issues, this work proposes a balanced Two-Stage DP-GAN (TS-DPGAN) framework. In Stage I, we use a data balancing algorithm with sampling techniques to reduce the bias and learn features from previously undertrained classes. Compared to a sampling strategy with fixed reference, a reference interval is introduced to reduce duplication in oversampling and information loss in undersampling. Then, the framework directly perturbs the balanced samples rather than gradients to achieve data-wise DP and improve sample diversity. Since data balancing uniformizes distribution, a feature-holding strategy was used in Stage II to keep important features from Stage I while restoring the original data distribution. Simulations show our framework outperforms other when compared with the SOTA algorithms on image quality, distribution maintaining, and convergence. •An interval-based sample balancing algorithm to reduce training bias.•Direct sample perturbations to guarantee data-wise privacy protection.•Feature holding to recover original distribution after balancing training.
ISSN:0957-4174
DOI:10.1016/j.eswa.2024.125181