DiffuseVAE++: Mitigating training-sampling mismatch based on additional noise for higher fidelity image generation

Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated remarkable results in image generation. However, there exist a mismatch between the training and sampling process in current diffusion models, in addition, the U-Net denoising network based on simple residual blocks cannot predict no...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Neurocomputing (Amsterdam) Ročník 633; s. 129814
Hlavní autoři: Yang, Xiaobao, Luo, Wei, Ning, Hailong, Zhang, Guorui, Sun, Wei, Ma, Sugang
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 07.06.2025
Témata:
ISSN:0925-2312
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated remarkable results in image generation. However, there exist a mismatch between the training and sampling process in current diffusion models, in addition, the U-Net denoising network based on simple residual blocks cannot predict noise information accurately, which affects the generated quality. To address these limitations, we present a novel image generation method that achieves higher fidelity. First, by additionally adding the standard Gaussian noise in the diffusion forward process, which does not disrupt the forward process, our method alleviates the mismatch. Subsequently, an important efficient denoising network based on U-Net is presented, where our proposed Simple Squeeze-Excitation and Simple GLU, combined with Depthwise Separable Convolution, enhance the ability of the model to predict real noise using the Simplified Nonlinear No Activation (SNNA) block. Furthermore, considering the structural characteristics of the baseline model, we introduce an additional cross-attention mechanism to enable DDPM to focus on VAE stage characteristics. Allowing the model to more accurately capture and learn the noise information. Finally, it is shown after extensive experiments the proposed DiffuseVAE++ obtains significant gains in FID scores, improving from 3.84 to 2.41 on CIFAR-10 and from 3.94 to 2.30 on CelebA-64. In particular, the IS scores on CIFAR-10 reaches 10.10, which is comparable to the current state-of-the-art methods competitively (e.g., U-ViT, StyleGAN2).
ISSN:0925-2312
DOI:10.1016/j.neucom.2025.129814