Joint channel–spatial entropy modeling for efficient visual coding

Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Neural computing & applications Ročník 37; číslo 21; s. 17111 - 17128
Hlavní autori: Li, Yuan, Jiang, Xiaotong, Sun, Zitang
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: London Springer London 01.07.2025
Springer Nature B.V
Predmet:
ISSN:0941-0643, 1433-3058
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in their latent representations, and many rely on sequential context models that slow down decoding. To address these issues, we propose a novel compression framework that performs joint channel–spatial context modeling for improved entropy coding. Our approach introduces a Multi-Dimensional Conditional Context (MDCC) architecture, which integrates a new non-serial channel-wise context model with spatial context conditioning to capture inter-channel correlations and local dependencies simultaneously. In addition, we design a Residual Local–Global Enhancement module that combines ConvNeXt convolutional blocks and Swin Transformer-based to capture fine-grained textures and global image structure in the latent representation. By augmenting the standard hyperprior with these rich contextual cues, the proposed method more accurately estimates latent distributions, leading to superior compression performance. Experiments on the Kodak and CLIC image datasets demonstrate that the proposed approach achieves up to a 17% bit-rate reduction over the latest VVC (H.266) standard at comparable quality. Furthermore, our model eliminates the autoregressive decoding bottleneck, enabling nearly a 10× faster decoding speed compared to previous state-of-the-art learned compression models. These results establish the effectiveness of joint channel–spatial context modeling and highlight the potential of the proposed MDCC framework for practical, high-performance neural image compression.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-025-11138-0