Joint channel–spatial entropy modeling for efficient visual coding

Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications Vol. 37; no. 21; pp. 17111 - 17128
Main Authors: Li, Yuan, Jiang, Xiaotong, Sun, Zitang
Format: Journal Article
Language:English
Published: London Springer London 01.07.2025
Springer Nature B.V
Subjects:
ISSN:0941-0643, 1433-3058
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in their latent representations, and many rely on sequential context models that slow down decoding. To address these issues, we propose a novel compression framework that performs joint channel–spatial context modeling for improved entropy coding. Our approach introduces a Multi-Dimensional Conditional Context (MDCC) architecture, which integrates a new non-serial channel-wise context model with spatial context conditioning to capture inter-channel correlations and local dependencies simultaneously. In addition, we design a Residual Local–Global Enhancement module that combines ConvNeXt convolutional blocks and Swin Transformer-based to capture fine-grained textures and global image structure in the latent representation. By augmenting the standard hyperprior with these rich contextual cues, the proposed method more accurately estimates latent distributions, leading to superior compression performance. Experiments on the Kodak and CLIC image datasets demonstrate that the proposed approach achieves up to a 17% bit-rate reduction over the latest VVC (H.266) standard at comparable quality. Furthermore, our model eliminates the autoregressive decoding bottleneck, enabling nearly a 10× faster decoding speed compared to previous state-of-the-art learned compression models. These results establish the effectiveness of joint channel–spatial context modeling and highlight the potential of the proposed MDCC framework for practical, high-performance neural image compression.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-025-11138-0