DiffuseVAE++: Mitigating training-sampling mismatch based on additional noise for higher fidelity image generation

Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated remarkable results in image generation. However, there exist a mismatch between the training and sampling process in current diffusion models, in addition, the U-Net denoising network based on simple residual blocks cannot predict no...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) Vol. 633; p. 129814
Main Authors: Yang, Xiaobao, Luo, Wei, Ning, Hailong, Zhang, Guorui, Sun, Wei, Ma, Sugang
Format: Journal Article
Language:English
Published: Elsevier B.V 07.06.2025
Subjects:
ISSN:0925-2312
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated remarkable results in image generation. However, there exist a mismatch between the training and sampling process in current diffusion models, in addition, the U-Net denoising network based on simple residual blocks cannot predict noise information accurately, which affects the generated quality. To address these limitations, we present a novel image generation method that achieves higher fidelity. First, by additionally adding the standard Gaussian noise in the diffusion forward process, which does not disrupt the forward process, our method alleviates the mismatch. Subsequently, an important efficient denoising network based on U-Net is presented, where our proposed Simple Squeeze-Excitation and Simple GLU, combined with Depthwise Separable Convolution, enhance the ability of the model to predict real noise using the Simplified Nonlinear No Activation (SNNA) block. Furthermore, considering the structural characteristics of the baseline model, we introduce an additional cross-attention mechanism to enable DDPM to focus on VAE stage characteristics. Allowing the model to more accurately capture and learn the noise information. Finally, it is shown after extensive experiments the proposed DiffuseVAE++ obtains significant gains in FID scores, improving from 3.84 to 2.41 on CIFAR-10 and from 3.94 to 2.30 on CelebA-64. In particular, the IS scores on CIFAR-10 reaches 10.10, which is comparable to the current state-of-the-art methods competitively (e.g., U-ViT, StyleGAN2).
ISSN:0925-2312
DOI:10.1016/j.neucom.2025.129814