Privacy and distribution preserving generative adversarial networks with sample balancing
Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no gua...
Uloženo v:
| Vydáno v: | Expert systems with applications Ročník 258; s. 125181 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
15.12.2024
|
| Témata: | |
| ISSN: | 0957-4174 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no guarantee for perfect protection on the data point. Due to noisy gradients, the training can converge to a suboptimum, or offer no protection when encountering a noise equilibrium. To address the above issues, this work proposes a balanced Two-Stage DP-GAN (TS-DPGAN) framework. In Stage I, we use a data balancing algorithm with sampling techniques to reduce the bias and learn features from previously undertrained classes. Compared to a sampling strategy with fixed reference, a reference interval is introduced to reduce duplication in oversampling and information loss in undersampling. Then, the framework directly perturbs the balanced samples rather than gradients to achieve data-wise DP and improve sample diversity. Since data balancing uniformizes distribution, a feature-holding strategy was used in Stage II to keep important features from Stage I while restoring the original data distribution. Simulations show our framework outperforms other when compared with the SOTA algorithms on image quality, distribution maintaining, and convergence.
•An interval-based sample balancing algorithm to reduce training bias.•Direct sample perturbations to guarantee data-wise privacy protection.•Feature holding to recover original distribution after balancing training. |
|---|---|
| ISSN: | 0957-4174 |
| DOI: | 10.1016/j.eswa.2024.125181 |