RDDM: A Rate-Distortion Guided Diffusion Model for Learned Image Compression Enhancement
Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD)...
Gespeichert in:
| Veröffentlicht in: | IEEE journal on emerging and selected topics in circuits and systems Jg. 15; H. 2; S. 186 - 199 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Piscataway
IEEE
01.06.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 2156-3357, 2156-3365 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM's performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as <inline-formula> <tex-math notation="LaTeX">\text{RDDM}^{\star } </tex-math></inline-formula>. The experimental results indicate that both RDDM and <inline-formula> <tex-math notation="LaTeX">\text{RDDM}^{\star } </tex-math></inline-formula> can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2156-3357 2156-3365 |
| DOI: | 10.1109/JETCAS.2025.3563228 |