SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding

As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over the past few years. A general trend in many recent works pushing the model adaptability or the application diversity of neural JSCC is based on...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on cognitive communications and networking Ročník 11; číslo 1; s. 90 - 104
Hlavní autoři: Yang, Ke, Wang, Sixian, Dai, Jincheng, Qin, Xiaoqi, Niu, Kai, Zhang, Ping
Médium: Journal Article
Jazyk:angličtina
Vydáno: Piscataway IEEE 01.02.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2332-7731, 2332-7731
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over the past few years. A general trend in many recent works pushing the model adaptability or the application diversity of neural JSCC is based on the convolutional neural network (CNN) backbone, whose model capacity is yet limited, inherently leading to inferior system coding gain against traditional coded transmission systems. In this paper, we establish a new neural JSCC backbone that can also adapt flexibly to diverse channel conditions and transmission rates within a single model, our open-source project aims to promote the research in this field. Specifically, we show that with elaborate design, neural JSCC codec built on the emerging Swin Transformer backbone achieves superior performance than conventional neural JSCC codecs built upon CNN, while also requiring lower end-to-end processing latency. Paired with two spatial modulation modules that scale latent representations based on the channel state information and target transmission rate, our baseline SwinJSCC can further upgrade to a versatile version, which increases its capability to adapt to diverse channel conditions and rate configurations. Extensive experimental results show that our SwinJSCC achieves better or comparable performance versus the state-of-the-art engineered BPG + 5G LDPC coded transmission system with much faster end-to-end coding speed, especially for high-resolution images, in which case traditional CNN-based JSCC yet falls behind due to its limited model capacity.
AbstractList As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over the past few years. A general trend in many recent works pushing the model adaptability or the application diversity of neural JSCC is based on the convolutional neural network (CNN) backbone, whose model capacity is yet limited, inherently leading to inferior system coding gain against traditional coded transmission systems. In this paper, we establish a new neural JSCC backbone that can also adapt flexibly to diverse channel conditions and transmission rates within a single model, our open-source project aims to promote the research in this field. Specifically, we show that with elaborate design, neural JSCC codec built on the emerging Swin Transformer backbone achieves superior performance than conventional neural JSCC codecs built upon CNN, while also requiring lower end-to-end processing latency. Paired with two spatial modulation modules that scale latent representations based on the channel state information and target transmission rate, our baseline SwinJSCC can further upgrade to a versatile version, which increases its capability to adapt to diverse channel conditions and rate configurations. Extensive experimental results show that our SwinJSCC achieves better or comparable performance versus the state-of-the-art engineered BPG + 5G LDPC coded transmission system with much faster end-to-end coding speed, especially for high-resolution images, in which case traditional CNN-based JSCC yet falls behind due to its limited model capacity.
Author Wang, Sixian
Niu, Kai
Dai, Jincheng
Yang, Ke
Zhang, Ping
Qin, Xiaoqi
Author_xml – sequence: 1
  givenname: Ke
  orcidid: 0009-0008-5530-0509
  surname: Yang
  fullname: Yang, Ke
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 2
  givenname: Sixian
  orcidid: 0000-0002-0621-1285
  surname: Wang
  fullname: Wang, Sixian
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 3
  givenname: Jincheng
  orcidid: 0000-0002-0310-568X
  surname: Dai
  fullname: Dai, Jincheng
  email: daijincheng@bupt.edu.cn
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 4
  givenname: Xiaoqi
  orcidid: 0000-0002-5788-0657
  surname: Qin
  fullname: Qin, Xiaoqi
  organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 5
  givenname: Kai
  orcidid: 0000-0002-8076-1867
  surname: Niu
  fullname: Niu, Kai
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 6
  givenname: Ping
  orcidid: 0000-0002-0269-104X
  surname: Zhang
  fullname: Zhang, Ping
  organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
BookMark eNp9kEtLw0AUhQepYK39AYKLgOvUeWUe7iRatRRdNK6HYeZGU9qZOkkR_70J7UJcuDqXy_nuuZxzNAoxAEKXBM8IwfqmKsuXGcWUzxinXHF6gsaUMZpLycjo13yGpm27xhgTQYVQfIzmq68mLFZleZtVdtuE92xYZFWyoa1j2kLKesnuAXbZIjahy1Zxnxzk5YcNATZZGX1PXaDT2m5amB51gt7mD1X5lC9fH5_Lu2XuqOZdLhlIZzH4GhT1Hpiiri40VuCY114qoTmlQoH2BQGhvbMKW85FzWqwlrAJuj7c3aX4uYe2M-v-ndBHGkYE5wWRUvUucnC5FNs2QW12qdna9G0INkNjZmjMDI2ZY2M9I_8wruls18TQJdts_iWvDmQDAL-SCqW55OwHHlh5kQ
CODEN ITCCG7
CitedBy_id crossref_primary_10_1109_JSEN_2025_3542396
crossref_primary_10_1007_s11277_025_11791_7
crossref_primary_10_1016_j_dcan_2025_06_010
crossref_primary_10_1038_s41598_025_16753_4
crossref_primary_10_3390_s25010269
crossref_primary_10_1109_ACCESS_2025_3546514
crossref_primary_10_3390_bdcc9090240
crossref_primary_10_1117_1_JEI_34_2_023052
crossref_primary_10_1109_OJCOMS_2025_3548079
crossref_primary_10_1109_JSAC_2025_3559158
crossref_primary_10_1109_LWC_2025_3574719
crossref_primary_10_1109_JSAC_2025_3559115
crossref_primary_10_1109_LWC_2025_3561012
crossref_primary_10_1364_OE_568127
crossref_primary_10_1109_TBC_2025_3559003
crossref_primary_10_1109_ACCESS_2025_3607699
Cites_doi 10.1109/CVPR42600.2020.01009
10.1109/ICCVW.2019.00246
10.1109/MWC.017.2100705
10.1109/JSAC.2021.3078489
10.1109/MSP.2010.938080
10.1109/TCCN.2019.2919300
10.1002/j.1538-7305.1948.tb01338.x
10.1109/ICCV48922.2021.00986
10.1109/TCSVT.2021.3082521
10.1109/MCOM.2018.1700839
10.1109/JSAC.2022.3223408
10.1007/s11263-020-01419-7
10.1109/CVPR.2018.00378
10.1109/ICASSP43922.2022.9746335
10.48550/ARXIV.1706.03762
10.1109/CVPRW.2017.150
10.1109/ICCV.2019.00356
10.1109/ICCV48922.2021.00061
10.1145/214762.214771
10.1007/978-3-030-58452-8_13
10.1109/TWC.2023.3234408
10.1109/ICASSP49357.2023.10094735
10.48550/arXiv.2010.11929
10.1109/TIT.2009.2021379
10.1109/ACSSC.2003.1292216
10.1109/TWC.2021.3090048
10.48550/arXiv.1312.6114
10.1109/TCCN.2022.3151935
10.1109/GLOBECOM54140.2023.10436878
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TCCN.2024.3424842
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2332-7731
EndPage 104
ExternalDocumentID 10_1109_TCCN_2024_3424842
10589474
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundationof China
  grantid: 62293481; 62371063; 92067202
  funderid: 10.13039/501100001809
– fundername: Beijing Natural Science Foundation
  grantid: L232047; 4222012
– fundername: Program for Youth Innovative Research Team of BUPT
  grantid: 2023QNTD02
GroupedDBID 0R~
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABJNI
ABQJQ
ABVLG
ACGFS
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IES
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c294t-73e7ca0edfe82dde382cf5908ec3d9d786942268e9d51e69dca80a446f3feaa13
IEDL.DBID RIE
ISICitedReferencesCount 34
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001416715000013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2332-7731
IngestDate Mon Jun 30 12:38:12 EDT 2025
Tue Nov 18 21:14:31 EST 2025
Sat Nov 29 03:02:28 EST 2025
Wed Aug 27 01:52:59 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 1
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c294t-73e7ca0edfe82dde382cf5908ec3d9d786942268e9d51e69dca80a446f3feaa13
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0008-5530-0509
0000-0002-5788-0657
0000-0002-8076-1867
0000-0002-0621-1285
0000-0002-0269-104X
0000-0002-0310-568X
PQID 3164451778
PQPubID 4437218
PageCount 15
ParticipantIDs crossref_primary_10_1109_TCCN_2024_3424842
proquest_journals_3164451778
crossref_citationtrail_10_1109_TCCN_2024_3424842
ieee_primary_10589474
PublicationCentury 2000
PublicationDate 2025-02-01
PublicationDateYYYYMMDD 2025-02-01
PublicationDate_xml – month: 02
  year: 2025
  text: 2025-02-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE transactions on cognitive communications and networking
PublicationTitleAbbrev TCCN
PublicationYear 2025
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref12
Han (ref34)
ref15
ref14
ref36
Ramachandran (ref28); 32
ref31
ref30
(ref38) 2021
ref10
ref32
ref2
Chi (ref33)
(ref39) 2018
ref1
ref16
ref19
ref18
Tan (ref26)
(ref37) 1993
ref24
ref25
Yuan (ref11) 2023
ref20
ref22
ref21
ref27
ref29
ref8
ref7
ref9
ref4
ref3
(ref17) 2018
ref6
ref5
ref40
Girod (ref23) 1993
Krizhevsky (ref13) 2009
References_xml – ident: ref29
  doi: 10.1109/CVPR42600.2020.01009
– start-page: 1
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref34
  article-title: Transformer in transformer
– ident: ref30
  doi: 10.1109/ICCVW.2019.00246
– ident: ref5
  doi: 10.1109/MWC.017.2100705
– ident: ref21
  doi: 10.1109/JSAC.2021.3078489
– ident: ref3
  doi: 10.1109/MSP.2010.938080
– ident: ref6
  doi: 10.1109/TCCN.2019.2919300
– year: 2018
  ident: ref39
  article-title: NR; physical layer procedures for data; (Release 16), Version 15.0.0
– start-page: 13564
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref33
  article-title: Relationnet++: Bridging visual representations for object detection via transformer decoder
– ident: ref2
  doi: 10.1002/j.1538-7305.1948.tb01338.x
– ident: ref16
  doi: 10.1109/ICCV48922.2021.00986
– year: 2009
  ident: ref13
  article-title: Learning multiple layers of features from tiny images
– ident: ref7
  doi: 10.1109/TCSVT.2021.3082521
– ident: ref19
  doi: 10.1109/MCOM.2018.1700839
– year: 2023
  ident: ref11
  article-title: Channel adaptive DL based joint source-channel coding without a prior knowledge
  publication-title: arXiv:2306.15183
– ident: ref4
  doi: 10.1109/JSAC.2022.3223408
– start-page: 6105
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref26
  article-title: Efficientnet: Rethinking model scaling for convolutional neural networks
– volume-title: Kodak PhotoCD dataset
  year: 1993
  ident: ref37
– ident: ref24
  doi: 10.1007/s11263-020-01419-7
– volume: 32
  start-page: 1
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref28
  article-title: Stand-alone self-attention in vision models
– ident: ref31
  doi: 10.1109/CVPR.2018.00378
– ident: ref9
  doi: 10.1109/ICASSP43922.2022.9746335
– ident: ref14
  doi: 10.48550/ARXIV.1706.03762
– ident: ref36
  doi: 10.1109/CVPRW.2017.150
– ident: ref27
  doi: 10.1109/ICCV.2019.00356
– ident: ref35
  doi: 10.1109/ICCV48922.2021.00061
– volume-title: CLIC 2021: Challenge on learned image compression
  year: 2021
  ident: ref38
– ident: ref18
  doi: 10.1145/214762.214771
– ident: ref32
  doi: 10.1007/978-3-030-58452-8_13
– ident: ref10
  doi: 10.1109/TWC.2023.3234408
– volume-title: BPG image format
  year: 2018
  ident: ref17
– ident: ref1
  doi: 10.1109/ICASSP49357.2023.10094735
– ident: ref15
  doi: 10.48550/arXiv.2010.11929
– ident: ref20
  doi: 10.1109/TIT.2009.2021379
– ident: ref25
  doi: 10.1109/ACSSC.2003.1292216
– ident: ref8
  doi: 10.1109/TWC.2021.3090048
– ident: ref22
  doi: 10.48550/arXiv.1312.6114
– start-page: 207
  volume-title: Digital Images and Human Vision
  year: 1993
  ident: ref23
  article-title: What’s wrong with mean-squared error?
– ident: ref40
  doi: 10.1109/TCCN.2022.3151935
– ident: ref12
  doi: 10.1109/GLOBECOM54140.2023.10436878
SSID ssj0001626684
Score 2.5337546
Snippet As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 90
SubjectTerms Adaptation models
Artificial neural networks
attention mechanism
Codec
Codes
Coding
Configuration management
Convolutional neural networks
Image coding
image communications
Image resolution
Joint source-channel coding
Network latency
Signal to noise ratio
swin transformer
Transformers
Vectors
Wireless communication
Title SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding
URI https://ieeexplore.ieee.org/document/10589474
https://www.proquest.com/docview/3164451778
Volume 11
WOSCitedRecordID wos001416715000013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2332-7731
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001626684
  issn: 2332-7731
  databaseCode: RIE
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46POjBnxOnU3LwJHS2Sdok3qQ6ZIchrMJuJUtfYTDbsXX675uknU5EwVNLSaB8-fG-95L3PoSuSUi0De57hgxzjylOPQkEzFyWE8mNhZkI7cQm-HAoxmP53CSru1wYAHCXz6BnX91ZflbqlQ2VmRUeCsk420bbnEd1stZXQMVQ80iw5uQy8OVtEsdD4wES1qOMMMHIN9vjxFR-7MDOrPQP_vlDh2i_4Y_4vh7wI7QFxTHa26gqeIL6o_dpMRjF8R1O1Kv5hO0HnKwpKiyweeAHgDkelNOiwiMXwfdspkEBMxyX1qC10Uv_MYmfvEYuwdNEssrjFLhWPmQ5CGJ2LSqIzq2kOWiayYyLSNq0WQEyCwOIZKaV8JVxB3Oag1IBPUWtoizgDOGcRMoWZg-D3JhwrkVIsiBnQht6ZShi0EH-GshUN7XEraTFLHU-hS9Ti31qsU8b7Dvo5rPLvC6k8VfjtgV7o2GNcwd118OVNmttmVLj8bEw4Fyc_9LtAu0SK9vrLlt3UatarOAS7ei3arpcXLlp9AFqMcNz
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7oFNQHrxOnU_Pgk9DZpumS-CbVMeccwirsrdT0FAazG7vo3zfJOp2Igk8tJYHy5XK-c5JzPoALGlBlgvuOJsPcYQn3HYkU9VyWL5JrC_MilBWb4J2O6PXkU5GsbnNhENFePsOaebVn-elQzUyoTK_wQEjG2SqsBYxRd56u9RVS0eS8Llhxdum58ioKw472ASmr-Ywyweg362PlVH7swdawNHb--Uu7sF0wSHIzH_I9WMF8H7aW6goeQKP73s9b3TC8JlHyqj8R84FEC5KKY6If5BZxRFrDfj4lXRvDd0yuQY4DEg6NSSvDc-MuCptOIZjgKCrZ1OE-cpW4mGYoqN63fEFVZkTNUfmpTLmoS5M4K1CmgYd1mapEuIl2CDM_wyTx_EMo5cMcj4BktJ6Y0uyBl2kjzpUIaOplTChNsDRJ9CrgLoCMVVFN3IhaDGLrVbgyNtjHBvu4wL4Cl59dRvNSGn81LhuwlxrOca5AdTFccbHaJrGvfT4WeJyL41-6ncNGM3psx-37zsMJbFIj4muvXlehNB3P8BTW1du0Pxmf2Sn1AcESxro
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SwinJSCC%3A+Taming+Swin+Transformer+for+Deep+Joint+Source-Channel+Coding&rft.jtitle=IEEE+transactions+on+cognitive+communications+and+networking&rft.au=Yang%2C+Ke&rft.au=Wang%2C+Sixian&rft.au=Dai%2C+Jincheng&rft.au=Qin%2C+Xiaoqi&rft.date=2025-02-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.eissn=2332-7731&rft.volume=11&rft.issue=1&rft.spage=90&rft_id=info:doi/10.1109%2FTCCN.2024.3424842&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2332-7731&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2332-7731&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2332-7731&client=summon