Joint channel–spatial entropy modeling for efficient visual coding

Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications Jg. 37; H. 21; S. 17111 - 17128
Hauptverfasser: Li, Yuan, Jiang, Xiaotong, Sun, Zitang
Format: Journal Article
Sprache:Englisch
Veröffentlicht: London Springer London 01.07.2025
Springer Nature B.V
Schlagworte:
ISSN:0941-0643, 1433-3058
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Deep learning-based methods have recently achieved impressive performance in lossy image compression, surpassing traditional codecs in rate-distortion efficiency. However, current learned compressors still struggle to fully exploit crossed-channel redundancies and long-range spatial dependencies in their latent representations, and many rely on sequential context models that slow down decoding. To address these issues, we propose a novel compression framework that performs joint channel–spatial context modeling for improved entropy coding. Our approach introduces a Multi-Dimensional Conditional Context (MDCC) architecture, which integrates a new non-serial channel-wise context model with spatial context conditioning to capture inter-channel correlations and local dependencies simultaneously. In addition, we design a Residual Local–Global Enhancement module that combines ConvNeXt convolutional blocks and Swin Transformer-based to capture fine-grained textures and global image structure in the latent representation. By augmenting the standard hyperprior with these rich contextual cues, the proposed method more accurately estimates latent distributions, leading to superior compression performance. Experiments on the Kodak and CLIC image datasets demonstrate that the proposed approach achieves up to a 17% bit-rate reduction over the latest VVC (H.266) standard at comparable quality. Furthermore, our model eliminates the autoregressive decoding bottleneck, enabling nearly a 10× faster decoding speed compared to previous state-of-the-art learned compression models. These results establish the effectiveness of joint channel–spatial context modeling and highlight the potential of the proposed MDCC framework for practical, high-performance neural image compression.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-025-11138-0