Achieving the Optimum Rate for Cross-Modal Source Coding

Multi-modal applications are expected to dominate in the 5G and B5G era. However, traditional source coding methods are not efficient or reliable due to neglecting semantic redundancy and mutual influences between different modalities' sources. To address this, cross-modal source coding (CMSC)...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on multimedia Ročník 26; s. 9722 - 9735
Hlavní autoři: Yuan, Zhe, Wu, Dan, Zhou, Liang
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 2024
Témata:
ISSN:1520-9210, 1941-0077
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Multi-modal applications are expected to dominate in the 5G and B5G era. However, traditional source coding methods are not efficient or reliable due to neglecting semantic redundancy and mutual influences between different modalities' sources. To address this, cross-modal source coding (CMSC) has been proposed as a promising solution. However, there are still two main challenges: determining the optimum rate of CMSC considering delay and reliability constraints, and designing a practical CMSC near the optimum rate. To tackle these challenges, this paper focuses on studying the optimum source coding rate of CMSC and its practical implementation. On the theoretical side, an <inline-formula><tex-math notation="LaTeX">(n,\epsilon)</tex-math></inline-formula>-achievable rate region is derived, representing the source coding rates subject to a fixed blocklength <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula> and the target error probability <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math></inline-formula>. Additionally, the optimum source coding rate can be approximated by calculating the infimum of the <inline-formula><tex-math notation="LaTeX">(n,\epsilon)</tex-math></inline-formula>-achievable rate region with a rate dispersion function. On the technical side, a general implementation for CMSC is proposed, which fully leveraging channel coding and artificial intelligence (AI) semantic analysis to achieve the optimum rate. Numerical results demonstrate that CMSC can obtain 50% improvement in theory and 37.5% enhancement in practice against the baseline model abstracted from traditional schemes when multi-modal sources are semantically correlated.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2024.3397192