Joint channel–spatial entropy modeling for efficient visual coding
Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in...
Uložené v:
| Vydané v: | Neural computing & applications Ročník 37; číslo 21; s. 17111 - 17128 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
London
Springer London
01.07.2025
Springer Nature B.V |
| Predmet: | |
| ISSN: | 0941-0643, 1433-3058 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in their latent representations, and many rely on sequential context models that slow down decoding. To address these issues, we propose a novel compression framework that performs joint channel–spatial context modeling for improved entropy coding. Our approach introduces a Multi-Dimensional Conditional Context (MDCC) architecture, which integrates a new non-serial channel-wise context model with spatial context conditioning to capture inter-channel correlations and local dependencies simultaneously. In addition, we design a Residual Local–Global Enhancement module that combines ConvNeXt convolutional blocks and Swin Transformer-based to capture fine-grained textures and global image structure in the latent representation. By augmenting the standard hyperprior with these rich contextual cues, the proposed method more accurately estimates latent distributions, leading to superior compression performance. Experiments on the Kodak and CLIC image datasets demonstrate that the proposed approach achieves up to a 17% bit-rate reduction over the latest VVC (H.266) standard at comparable quality. Furthermore, our model eliminates the autoregressive decoding bottleneck, enabling nearly a 10× faster decoding speed compared to previous state-of-the-art learned compression models. These results establish the effectiveness of joint channel–spatial context modeling and highlight the potential of the proposed MDCC framework for practical, high-performance neural image compression. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0941-0643 1433-3058 |
| DOI: | 10.1007/s00521-025-11138-0 |