SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding
As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over the past few years. A general trend in many recent works pushing the model adaptability or the application diversity of neural JSCC is based on...
Uloženo v:
| Vydáno v: | IEEE transactions on cognitive communications and networking Ročník 11; číslo 1; s. 90 - 104 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Piscataway
IEEE
01.02.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2332-7731, 2332-7731 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over the past few years. A general trend in many recent works pushing the model adaptability or the application diversity of neural JSCC is based on the convolutional neural network (CNN) backbone, whose model capacity is yet limited, inherently leading to inferior system coding gain against traditional coded transmission systems. In this paper, we establish a new neural JSCC backbone that can also adapt flexibly to diverse channel conditions and transmission rates within a single model, our open-source project aims to promote the research in this field. Specifically, we show that with elaborate design, neural JSCC codec built on the emerging Swin Transformer backbone achieves superior performance than conventional neural JSCC codecs built upon CNN, while also requiring lower end-to-end processing latency. Paired with two spatial modulation modules that scale latent representations based on the channel state information and target transmission rate, our baseline SwinJSCC can further upgrade to a versatile version, which increases its capability to adapt to diverse channel conditions and rate configurations. Extensive experimental results show that our SwinJSCC achieves better or comparable performance versus the state-of-the-art engineered BPG + 5G LDPC coded transmission system with much faster end-to-end coding speed, especially for high-resolution images, in which case traditional CNN-based JSCC yet falls behind due to its limited model capacity. |
|---|---|
| AbstractList | As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over the past few years. A general trend in many recent works pushing the model adaptability or the application diversity of neural JSCC is based on the convolutional neural network (CNN) backbone, whose model capacity is yet limited, inherently leading to inferior system coding gain against traditional coded transmission systems. In this paper, we establish a new neural JSCC backbone that can also adapt flexibly to diverse channel conditions and transmission rates within a single model, our open-source project aims to promote the research in this field. Specifically, we show that with elaborate design, neural JSCC codec built on the emerging Swin Transformer backbone achieves superior performance than conventional neural JSCC codecs built upon CNN, while also requiring lower end-to-end processing latency. Paired with two spatial modulation modules that scale latent representations based on the channel state information and target transmission rate, our baseline SwinJSCC can further upgrade to a versatile version, which increases its capability to adapt to diverse channel conditions and rate configurations. Extensive experimental results show that our SwinJSCC achieves better or comparable performance versus the state-of-the-art engineered BPG + 5G LDPC coded transmission system with much faster end-to-end coding speed, especially for high-resolution images, in which case traditional CNN-based JSCC yet falls behind due to its limited model capacity. |
| Author | Wang, Sixian Niu, Kai Dai, Jincheng Yang, Ke Zhang, Ping Qin, Xiaoqi |
| Author_xml | – sequence: 1 givenname: Ke orcidid: 0009-0008-5530-0509 surname: Yang fullname: Yang, Ke organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 2 givenname: Sixian orcidid: 0000-0002-0621-1285 surname: Wang fullname: Wang, Sixian organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 3 givenname: Jincheng orcidid: 0000-0002-0310-568X surname: Dai fullname: Dai, Jincheng email: daijincheng@bupt.edu.cn organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 4 givenname: Xiaoqi orcidid: 0000-0002-5788-0657 surname: Qin fullname: Qin, Xiaoqi organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 5 givenname: Kai orcidid: 0000-0002-8076-1867 surname: Niu fullname: Niu, Kai organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 6 givenname: Ping orcidid: 0000-0002-0269-104X surname: Zhang fullname: Zhang, Ping organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China |
| BookMark | eNp9kEtLw0AUhQepYK39AYKLgOvUeWUe7iRatRRdNK6HYeZGU9qZOkkR_70J7UJcuDqXy_nuuZxzNAoxAEKXBM8IwfqmKsuXGcWUzxinXHF6gsaUMZpLycjo13yGpm27xhgTQYVQfIzmq68mLFZleZtVdtuE92xYZFWyoa1j2kLKesnuAXbZIjahy1Zxnxzk5YcNATZZGX1PXaDT2m5amB51gt7mD1X5lC9fH5_Lu2XuqOZdLhlIZzH4GhT1Hpiiri40VuCY114qoTmlQoH2BQGhvbMKW85FzWqwlrAJuj7c3aX4uYe2M-v-ndBHGkYE5wWRUvUucnC5FNs2QW12qdna9G0INkNjZmjMDI2ZY2M9I_8wruls18TQJdts_iWvDmQDAL-SCqW55OwHHlh5kQ |
| CODEN | ITCCG7 |
| CitedBy_id | crossref_primary_10_1109_JSEN_2025_3542396 crossref_primary_10_1007_s11277_025_11791_7 crossref_primary_10_1016_j_dcan_2025_06_010 crossref_primary_10_1038_s41598_025_16753_4 crossref_primary_10_3390_s25010269 crossref_primary_10_1109_ACCESS_2025_3546514 crossref_primary_10_3390_bdcc9090240 crossref_primary_10_1117_1_JEI_34_2_023052 crossref_primary_10_1109_OJCOMS_2025_3548079 crossref_primary_10_1109_JSAC_2025_3559158 crossref_primary_10_1109_LWC_2025_3574719 crossref_primary_10_1109_JSAC_2025_3559115 crossref_primary_10_1109_LWC_2025_3561012 crossref_primary_10_1364_OE_568127 crossref_primary_10_1109_TBC_2025_3559003 crossref_primary_10_1109_ACCESS_2025_3607699 |
| Cites_doi | 10.1109/CVPR42600.2020.01009 10.1109/ICCVW.2019.00246 10.1109/MWC.017.2100705 10.1109/JSAC.2021.3078489 10.1109/MSP.2010.938080 10.1109/TCCN.2019.2919300 10.1002/j.1538-7305.1948.tb01338.x 10.1109/ICCV48922.2021.00986 10.1109/TCSVT.2021.3082521 10.1109/MCOM.2018.1700839 10.1109/JSAC.2022.3223408 10.1007/s11263-020-01419-7 10.1109/CVPR.2018.00378 10.1109/ICASSP43922.2022.9746335 10.48550/ARXIV.1706.03762 10.1109/CVPRW.2017.150 10.1109/ICCV.2019.00356 10.1109/ICCV48922.2021.00061 10.1145/214762.214771 10.1007/978-3-030-58452-8_13 10.1109/TWC.2023.3234408 10.1109/ICASSP49357.2023.10094735 10.48550/arXiv.2010.11929 10.1109/TIT.2009.2021379 10.1109/ACSSC.2003.1292216 10.1109/TWC.2021.3090048 10.48550/arXiv.1312.6114 10.1109/TCCN.2022.3151935 10.1109/GLOBECOM54140.2023.10436878 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
| DOI | 10.1109/TCCN.2024.3424842 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
| DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2332-7731 |
| EndPage | 104 |
| ExternalDocumentID | 10_1109_TCCN_2024_3424842 10589474 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundationof China grantid: 62293481; 62371063; 92067202 funderid: 10.13039/501100001809 – fundername: Beijing Natural Science Foundation grantid: L232047; 4222012 – fundername: Program for Youth Innovative Research Team of BUPT grantid: 2023QNTD02 |
| GroupedDBID | 0R~ 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG ACGFS AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IES IFIPE IPLJI JAVBF M43 O9- OCL RIA RIE AAYXX CITATION 7SP 8FD L7M |
| ID | FETCH-LOGICAL-c294t-73e7ca0edfe82dde382cf5908ec3d9d786942268e9d51e69dca80a446f3feaa13 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 34 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001416715000013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2332-7731 |
| IngestDate | Mon Jun 30 12:38:12 EDT 2025 Tue Nov 18 21:14:31 EST 2025 Sat Nov 29 03:02:28 EST 2025 Wed Aug 27 01:52:59 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c294t-73e7ca0edfe82dde382cf5908ec3d9d786942268e9d51e69dca80a446f3feaa13 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0009-0008-5530-0509 0000-0002-5788-0657 0000-0002-8076-1867 0000-0002-0621-1285 0000-0002-0269-104X 0000-0002-0310-568X |
| PQID | 3164451778 |
| PQPubID | 4437218 |
| PageCount | 15 |
| ParticipantIDs | crossref_primary_10_1109_TCCN_2024_3424842 proquest_journals_3164451778 crossref_citationtrail_10_1109_TCCN_2024_3424842 ieee_primary_10589474 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-02-01 |
| PublicationDateYYYYMMDD | 2025-02-01 |
| PublicationDate_xml | – month: 02 year: 2025 text: 2025-02-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE transactions on cognitive communications and networking |
| PublicationTitleAbbrev | TCCN |
| PublicationYear | 2025 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 ref12 Han (ref34) ref15 ref14 ref36 Ramachandran (ref28); 32 ref31 ref30 (ref38) 2021 ref10 ref32 ref2 Chi (ref33) (ref39) 2018 ref1 ref16 ref19 ref18 Tan (ref26) (ref37) 1993 ref24 ref25 Yuan (ref11) 2023 ref20 ref22 ref21 ref27 ref29 ref8 ref7 ref9 ref4 ref3 (ref17) 2018 ref6 ref5 ref40 Girod (ref23) 1993 Krizhevsky (ref13) 2009 |
| References_xml | – ident: ref29 doi: 10.1109/CVPR42600.2020.01009 – start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref34 article-title: Transformer in transformer – ident: ref30 doi: 10.1109/ICCVW.2019.00246 – ident: ref5 doi: 10.1109/MWC.017.2100705 – ident: ref21 doi: 10.1109/JSAC.2021.3078489 – ident: ref3 doi: 10.1109/MSP.2010.938080 – ident: ref6 doi: 10.1109/TCCN.2019.2919300 – year: 2018 ident: ref39 article-title: NR; physical layer procedures for data; (Release 16), Version 15.0.0 – start-page: 13564 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref33 article-title: Relationnet++: Bridging visual representations for object detection via transformer decoder – ident: ref2 doi: 10.1002/j.1538-7305.1948.tb01338.x – ident: ref16 doi: 10.1109/ICCV48922.2021.00986 – year: 2009 ident: ref13 article-title: Learning multiple layers of features from tiny images – ident: ref7 doi: 10.1109/TCSVT.2021.3082521 – ident: ref19 doi: 10.1109/MCOM.2018.1700839 – year: 2023 ident: ref11 article-title: Channel adaptive DL based joint source-channel coding without a prior knowledge publication-title: arXiv:2306.15183 – ident: ref4 doi: 10.1109/JSAC.2022.3223408 – start-page: 6105 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref26 article-title: Efficientnet: Rethinking model scaling for convolutional neural networks – volume-title: Kodak PhotoCD dataset year: 1993 ident: ref37 – ident: ref24 doi: 10.1007/s11263-020-01419-7 – volume: 32 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref28 article-title: Stand-alone self-attention in vision models – ident: ref31 doi: 10.1109/CVPR.2018.00378 – ident: ref9 doi: 10.1109/ICASSP43922.2022.9746335 – ident: ref14 doi: 10.48550/ARXIV.1706.03762 – ident: ref36 doi: 10.1109/CVPRW.2017.150 – ident: ref27 doi: 10.1109/ICCV.2019.00356 – ident: ref35 doi: 10.1109/ICCV48922.2021.00061 – volume-title: CLIC 2021: Challenge on learned image compression year: 2021 ident: ref38 – ident: ref18 doi: 10.1145/214762.214771 – ident: ref32 doi: 10.1007/978-3-030-58452-8_13 – ident: ref10 doi: 10.1109/TWC.2023.3234408 – volume-title: BPG image format year: 2018 ident: ref17 – ident: ref1 doi: 10.1109/ICASSP49357.2023.10094735 – ident: ref15 doi: 10.48550/arXiv.2010.11929 – ident: ref20 doi: 10.1109/TIT.2009.2021379 – ident: ref25 doi: 10.1109/ACSSC.2003.1292216 – ident: ref8 doi: 10.1109/TWC.2021.3090048 – ident: ref22 doi: 10.48550/arXiv.1312.6114 – start-page: 207 volume-title: Digital Images and Human Vision year: 1993 ident: ref23 article-title: What’s wrong with mean-squared error? – ident: ref40 doi: 10.1109/TCCN.2022.3151935 – ident: ref12 doi: 10.1109/GLOBECOM54140.2023.10436878 |
| SSID | ssj0001626684 |
| Score | 2.5337546 |
| Snippet | As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 90 |
| SubjectTerms | Adaptation models Artificial neural networks attention mechanism Codec Codes Coding Configuration management Convolutional neural networks Image coding image communications Image resolution Joint source-channel coding Network latency Signal to noise ratio swin transformer Transformers Vectors Wireless communication |
| Title | SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding |
| URI | https://ieeexplore.ieee.org/document/10589474 https://www.proquest.com/docview/3164451778 |
| Volume | 11 |
| WOSCitedRecordID | wos001416715000013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2332-7731 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001626684 issn: 2332-7731 databaseCode: RIE dateStart: 20150101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46POjBnxOnU3LwJHS2Sdok3qQ6ZIchrMJuJUtfYTDbsXX675uknU5EwVNLSaB8-fG-95L3PoSuSUi0De57hgxzjylOPQkEzFyWE8mNhZkI7cQm-HAoxmP53CSru1wYAHCXz6BnX91ZflbqlQ2VmRUeCsk420bbnEd1stZXQMVQ80iw5uQy8OVtEsdD4wES1qOMMMHIN9vjxFR-7MDOrPQP_vlDh2i_4Y_4vh7wI7QFxTHa26gqeIL6o_dpMRjF8R1O1Kv5hO0HnKwpKiyweeAHgDkelNOiwiMXwfdspkEBMxyX1qC10Uv_MYmfvEYuwdNEssrjFLhWPmQ5CGJ2LSqIzq2kOWiayYyLSNq0WQEyCwOIZKaV8JVxB3Oag1IBPUWtoizgDOGcRMoWZg-D3JhwrkVIsiBnQht6ZShi0EH-GshUN7XEraTFLHU-hS9Ti31qsU8b7Dvo5rPLvC6k8VfjtgV7o2GNcwd118OVNmttmVLj8bEw4Fyc_9LtAu0SK9vrLlt3UatarOAS7ei3arpcXLlp9AFqMcNz |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7oFNQHrxOnU_Pgk9DZpumS-CbVMeccwirsrdT0FAazG7vo3zfJOp2Igk8tJYHy5XK-c5JzPoALGlBlgvuOJsPcYQn3HYkU9VyWL5JrC_MilBWb4J2O6PXkU5GsbnNhENFePsOaebVn-elQzUyoTK_wQEjG2SqsBYxRd56u9RVS0eS8Llhxdum58ioKw472ASmr-Ywyweg362PlVH7swdawNHb--Uu7sF0wSHIzH_I9WMF8H7aW6goeQKP73s9b3TC8JlHyqj8R84FEC5KKY6If5BZxRFrDfj4lXRvDd0yuQY4DEg6NSSvDc-MuCptOIZjgKCrZ1OE-cpW4mGYoqN63fEFVZkTNUfmpTLmoS5M4K1CmgYd1mapEuIl2CDM_wyTx_EMo5cMcj4BktJ6Y0uyBl2kjzpUIaOplTChNsDRJ9CrgLoCMVVFN3IhaDGLrVbgyNtjHBvu4wL4Cl59dRvNSGn81LhuwlxrOca5AdTFccbHaJrGvfT4WeJyL41-6ncNGM3psx-37zsMJbFIj4muvXlehNB3P8BTW1du0Pxmf2Sn1AcESxro |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SwinJSCC%3A+Taming+Swin+Transformer+for+Deep+Joint+Source-Channel+Coding&rft.jtitle=IEEE+transactions+on+cognitive+communications+and+networking&rft.au=Yang%2C+Ke&rft.au=Wang%2C+Sixian&rft.au=Dai%2C+Jincheng&rft.au=Qin%2C+Xiaoqi&rft.date=2025-02-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.eissn=2332-7731&rft.volume=11&rft.issue=1&rft.spage=90&rft_id=info:doi/10.1109%2FTCCN.2024.3424842&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2332-7731&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2332-7731&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2332-7731&client=summon |