Zero-Shot Semantic Communication With Multimodal Foundation Models

Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deploym...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on vehicular technology s. 1 - 6
Hlavní autori: Hu, Jiangjing, Wu, Haotian, Zhang, Wenjing, Wang, Fengyu, Xu, Wenjun, Gao, Hui, Gunduz, Deniz
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: IEEE 2025
Predmet:
ISSN:0018-9545, 1939-9359
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, in this paper, we propose SemCLIP, a zero-shot SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. CLIP-generated image tokens are transmitted in SemCLIP under low bandwidth and challenging channel conditions, facilitating diverse zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP token encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning (TAPL) mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a 41% improvement in zero-shot performance at low signal-to-noise ratios. Meanwhile, SemCLIP reduces bandwidth usage by more than 50-fold compared to alternative image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution.
AbstractList Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, in this paper, we propose SemCLIP, a zero-shot SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. CLIP-generated image tokens are transmitted in SemCLIP under low bandwidth and challenging channel conditions, facilitating diverse zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP token encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning (TAPL) mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a 41% improvement in zero-shot performance at low signal-to-noise ratios. Meanwhile, SemCLIP reduces bandwidth usage by more than 50-fold compared to alternative image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution.
Author Wang, Fengyu
Gao, Hui
Hu, Jiangjing
Zhang, Wenjing
Gunduz, Deniz
Wu, Haotian
Xu, Wenjun
Author_xml – sequence: 1
  givenname: Jiangjing
  surname: Hu
  fullname: Hu, Jiangjing
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 2
  givenname: Haotian
  surname: Wu
  fullname: Wu, Haotian
  organization: Department of Electrical and Electronic Engineering, Imperial College London, London, U.K
– sequence: 3
  givenname: Wenjing
  surname: Zhang
  fullname: Zhang, Wenjing
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 4
  givenname: Fengyu
  surname: Wang
  fullname: Wang, Fengyu
  email: fengyu.wang@bupt.edu.cn
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 5
  givenname: Wenjun
  surname: Xu
  fullname: Xu, Wenjun
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 6
  givenname: Hui
  surname: Gao
  fullname: Gao, Hui
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 7
  givenname: Deniz
  surname: Gunduz
  fullname: Gunduz, Deniz
  organization: Department of Electrical and Electronic Engineering, Imperial College London, London, U.K
BookMark eNpFkE1LxDAQhoOsYHf17sFD_0BrJh9Nc9TFVWEXD1sVvIS0TdlI20jTHvz3pnTBywzDM-_APGu06l1vELoFnAJgeV98FCnBhKc0oySX9AJFIKlMJOVyhSKMIU8kZ_wKrb3_DiNjEiL0-GUGlxxPboyPptP9aKt467pu6m2lR-v6-NOOp_gwtaPtXK3beOemvl7QwdWm9dfostGtNzfnvkHvu6di-5Ls355ftw_7pALKxiQXpSRZTQRwVmEuNEjelEzzWkOuRckgx00luM5ERsILJNSZB5oZWlO6QXi5Ww3O-8E06mewnR5-FWA1O1DBgZodqLODELlbItYY878OhEnGBP0D-a5ZjA
CODEN ITVTAB
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TVT.2025.3632893
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1939-9359
EndPage 6
ExternalDocumentID 10_1109_TVT_2025_3632893
11249447
Genre orig-research
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAIKC
AAJGR
AAMNW
AASAJ
AAWTH
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
HZ~
IFIPE
IPLJI
JAVBF
LAI
MS~
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
3EH
5VS
AAYXX
AETIX
AGSQL
AI.
AIBXA
ALLEH
CITATION
EJD
H~9
IAAWW
IBMZZ
ICLAB
IFJZH
M43
VH1
ID FETCH-LOGICAL-c134t-87b926d27154c057a195fb4a5da18a7b4180fc75a67623282623fb4aa186e3d33
IEDL.DBID RIE
ISSN 0018-9545
IngestDate Sat Nov 29 06:53:56 EST 2025
Wed Nov 26 07:22:37 EST 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c134t-87b926d27154c057a195fb4a5da18a7b4180fc75a67623282623fb4aa186e3d33
PageCount 6
ParticipantIDs crossref_primary_10_1109_TVT_2025_3632893
ieee_primary_11249447
PublicationCentury 2000
PublicationDate 2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025-00-00
PublicationDecade 2020
PublicationTitle IEEE transactions on vehicular technology
PublicationTitleAbbrev TVT
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0014491
Score 2.4514177
Snippet Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 1
SubjectTerms Bandwidth
Deep joint source-channel coding
Encoding
Feature extraction
Foundation models
Image reconstruction
prompt optimization
Receivers
Robustness
Semantic communication
semantic communications
Signal to noise ratio
token communications
Vectors
Title Zero-Shot Semantic Communication With Multimodal Foundation Models
URI https://ieeexplore.ieee.org/document/11249447
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1939-9359
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014491
  issn: 0018-9545
  databaseCode: RIE
  dateStart: 19670101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED3RigEGPosoX8rAwuA2iR07HgFRMaAKqaVULJHjuGol2qA25fdzdlLaDgwsURQ7VvQs--58efcAbhVaCVs1hFjjTlhqGLFlt4iMDFMYkSjlWGmDF9HtxsOhfK3I6o4LY4xxP5-Zlr11ufws10t7VNYOrFIyY6IGNSFESdb6TRkwVsnjBbiC0S9Y5SR92e4P-hgJhlGLcooBBt2yQRuiKs6mdA7_-TVHcFA5j959OdvHsGNmJ7C_UVLwFB4-zDwnvXFeeD0zRdgm2tsigXjvk2LsOd7tNM9wtLWwkmeF0T4XDXjrPPUfn0mlk0B0QFmBG1oqQ56FAt0hjf6XCmQ0SpmKMhXESqQsiP2RFpHiuPMhACFebTu2ckMzSs-gPstn5hw8jeFIqGLJ7QBM-EpJHlOu4zBjWSB0E-5WyCVfZTmMxIURvkwQ5cSinFQoN6FhQVv3q_C6-OP5JezZ18vzjSuoF_OluYZd_V1MFvMbN9k_8BylhQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5BQQIGnkWUZwYWhrRx7Dw8AqIqolRIDaViiRzHVSvRBrUpv5-zk1I6MLBEURydos-K787n7z6Aa4FeQncNsbVzt1mimK3bbtncU0xgRiKEYaX12kGnE_b7_KUkqxsujFLKHD5TdX1ravlpJud6q6xBtFIyY8E6bHiMuaSga_0UDRgrBfII_sMYGSyqkg5vRL0Ic0HXq1OfYopBV7zQL1kV41Wae__8nn3YLcNH67aY7wNYU5ND2PnVVPAI7t7VNLO7wyy3umqMwI2ktUIDsd5G-dAyzNtxlqK1pbSSpaXRPmZVeG0-RPctu1RKsCWhLMclLeGun7oBBkQSIzBBuDdImPBSQUIRJIyEzkAGnvBx7UMAXLzqcRz1FU0pPYbKJJuoE7AkJiSuCLmvDbDAEYL7IfVl6KYsJYGswc0CufizaIgRm0TC4TGiHGuU4xLlGlQ1aMv3SrxO_3h-BVut6Lkdtx87T2ewrU0Vux3nUMmnc3UBm_IrH82ml2bivwG-yqjM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Zero-Shot+Semantic+Communication+With+Multimodal+Foundation+Models&rft.jtitle=IEEE+transactions+on+vehicular+technology&rft.au=Hu%2C+Jiangjing&rft.au=Wu%2C+Haotian&rft.au=Zhang%2C+Wenjing&rft.au=Wang%2C+Fengyu&rft.date=2025&rft.issn=0018-9545&rft.eissn=1939-9359&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FTVT.2025.3632893&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVT_2025_3632893
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9545&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9545&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9545&client=summon