A Low Complexity Algorithm for 3D-HEVC Depth Map Intra Coding Based on MAD and ResNet

As an extension of HEVC, 3D-HEVC retains the quadtree structure inherent to HEVC and is currently recognized as the most widely adopted international standard for stereoscopic video coding. In intra coding, quadtree partitioning is determined recursively through rate-distortion cost calculations. Th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access Jg. 13; S. 111722 - 111732
Hauptverfasser: Tian, Erlin, Zhang, Jiabao, Zhang, Qiuwen
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2169-3536, 2169-3536
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As an extension of HEVC, 3D-HEVC retains the quadtree structure inherent to HEVC and is currently recognized as the most widely adopted international standard for stereoscopic video coding. In intra coding, quadtree partitioning is determined recursively through rate-distortion cost calculations. This process demands extensive computational resources and results in high encoding complexity. To mitigate this challenge, the present paper proposes a deep learning-based encoding algorithm designed to replace the intricate coding unit (CU) partitioning process utilized in HTM. First, we introduce the Mean Absolute Difference (MAD), which quantifies the dispersion of pixel values around the mean within a given region. By calculating the ratio of a coding unit's MAD to its pixel mean, we categorize <inline-formula> <tex-math notation="LaTeX">64\times 64 </tex-math></inline-formula> CUs into smooth and complex CUs. For smooth CUs, partitioning is terminated prematurely to minimize redundant rate-distortion optimization (RDO) computations. In contrast, for complex CUs, we propose a lightweight ResNet (Residual Neural Network) model that substitutes standard convolutions with depthwise separable convolutions (DSC) in order to decrease the number of parameters. This model effectively integrates both local and global features to generate partitioning predictions at various depths, while incorporating the quantization parameter (QP) into the input to enhance prediction accuracy. Experimental results indicate that, in comparison to the original HTM-16.2 method, the proposed approach achieves a reduction in encoding time of 48.16%, while only resulting in an increase of 0.28% in BDBR.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3581930