Achieving the Optimum Rate for Cross-Modal Source Coding

Multi-modal applications are expected to dominate in the 5G and B5G era. However, traditional source coding methods are not efficient or reliable due to neglecting semantic redundancy and mutual influences between different modalities' sources. To address this, cross-modal source coding (CMSC)...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia Jg. 26; S. 9722 - 9735
Hauptverfasser: Yuan, Zhe, Wu, Dan, Zhou, Liang
Format: Journal Article
Sprache:Englisch
Veröffentlicht: IEEE 2024
Schlagworte:
ISSN:1520-9210, 1941-0077
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multi-modal applications are expected to dominate in the 5G and B5G era. However, traditional source coding methods are not efficient or reliable due to neglecting semantic redundancy and mutual influences between different modalities' sources. To address this, cross-modal source coding (CMSC) has been proposed as a promising solution. However, there are still two main challenges: determining the optimum rate of CMSC considering delay and reliability constraints, and designing a practical CMSC near the optimum rate. To tackle these challenges, this paper focuses on studying the optimum source coding rate of CMSC and its practical implementation. On the theoretical side, an <inline-formula><tex-math notation="LaTeX">(n,\epsilon)</tex-math></inline-formula>-achievable rate region is derived, representing the source coding rates subject to a fixed blocklength <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula> and the target error probability <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math></inline-formula>. Additionally, the optimum source coding rate can be approximated by calculating the infimum of the <inline-formula><tex-math notation="LaTeX">(n,\epsilon)</tex-math></inline-formula>-achievable rate region with a rate dispersion function. On the technical side, a general implementation for CMSC is proposed, which fully leveraging channel coding and artificial intelligence (AI) semantic analysis to achieve the optimum rate. Numerical results demonstrate that CMSC can obtain 50% improvement in theory and 37.5% enhancement in practice against the baseline model abstracted from traditional schemes when multi-modal sources are semantically correlated.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2024.3397192