SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling

In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE journal on selected areas in communications Ročník 43; číslo 4; s. 1308 - 1322
Hlavní autori: Yao, Shengshi, Dai, Jincheng, Qin, Xiaoqi, Wang, Sixian, Wang, Siye, Niu, Kai, Zhang, Ping
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 01.04.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:0733-8716, 1558-0008
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.
AbstractList In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.
Author Wang, Sixian
Yao, Shengshi
Niu, Kai
Dai, Jincheng
Zhang, Ping
Qin, Xiaoqi
Wang, Siye
Author_xml – sequence: 1
  givenname: Shengshi
  surname: Yao
  fullname: Yao, Shengshi
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 2
  givenname: Jincheng
  orcidid: 0000-0002-0310-568X
  surname: Dai
  fullname: Dai, Jincheng
  email: daijincheng@bupt.edu.cn
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 3
  givenname: Xiaoqi
  surname: Qin
  fullname: Qin, Xiaoqi
  organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 4
  givenname: Sixian
  surname: Wang
  fullname: Wang, Sixian
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 5
  givenname: Siye
  surname: Wang
  fullname: Wang, Siye
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 6
  givenname: Kai
  surname: Niu
  fullname: Niu, Kai
  organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 7
  givenname: Ping
  surname: Zhang
  fullname: Zhang, Ping
  organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
BookMark eNpNkD1PwzAURS0EEm3hByAxRGJO8YvjxGGrCuVDqZBoEQND5MYvxSXYxU6Q-Pe4agemt9x79c4ZkmNjDRJyAXQMQIvrp8VkOk5owseMM0hpdkQGwLmIKaXimAxozlgscshOydD7DaWQpiIZkPeF7Y1abJ0265uotN7HL-h1q9F00aRX2kZLJ42vUf-gi9509xHd9rKNZ72pO22NbKO59J-oolKadS_XGM2twjbsnZGTRrYezw93RF5nd8vpQ1w-3z9OJ2VcMxBdjMmK17mSUhR5zRWHQKGyJm2SRqGqQXIs5Ao4FKqQgkOiuEjEihVcZEpyxUbkar-7dfa7R99VG9u78JmvGOSigMCahhTsU7ULlA6bKkB_SfdbAa12Dqudw2rnsDo4DJ3LfUcj4r-8SLngKfsDwoZvrQ
CODEN ISACEM
Cites_doi 10.21437/Interspeech.2022-11439
10.21437/Interspeech.2019-1255
10.1109/ICASSP.2019.8683855
10.1109/CVPR52729.2023.01008
10.1145/3657282
10.1109/ICASSP40776.2020.9054347
10.1109/JSAIT.2022.3231042
10.1109/MSP.2010.938080
10.1109/JSAC.2022.3180802
10.1109/TCOM.1981.1094975
10.1109/WCNC55385.2023.10118921
10.1109/ICASSP49357.2023.10094680
10.1109/JSAC.2021.3087240
10.1109/ICASSP.2018.8462529
10.17487/rfc7587
10.1109/ICASSP.2018.8462116
10.1109/GLOBECOM48099.2022.10000735
10.17487/rfc6716
10.1109/TASLP.2023.3277693
10.1109/QoMEX48832.2020.9123150
10.21437/Interspeech.2023-1532
10.1109/TSA.2002.804299
10.1109/TCOMM.2024.3386577
10.1109/ICASSP49357.2023.10096528
10.48550/ARXIV.1706.03762
10.1109/TASLP.2021.3129994
10.1109/TASLP.2019.2947232
10.21437/Interspeech.2004-579
10.21437/Interspeech.2022-779
10.1109/JSAC.2022.3221952
10.1109/ICASSP.2015.7178964
10.1109/MWC.005.2300574
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/JSAC.2025.3531406
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-0008
EndPage 1322
ExternalDocumentID 10_1109_JSAC_2025_3531406
10845854
Genre orig-research
GrantInformation_xml – fundername: Program for Youth Innovative Research Team of BUPT
  grantid: 2023YQTD02
– fundername: National Key Research and Development Program of China
  grantid: 2024YFF0509700
– fundername: Beijing Municipal Natural Science Foundation
  grantid: L232047; 4222012
– fundername: National Natural Science Foundation of China
  grantid: 62321001; 62293481; 62371063; 92267301; 62201089
  funderid: 10.13039/501100001809
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
41~
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
ADRHT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IBMZZ
ICLAB
IES
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TN5
VH1
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c318t-e2b5c7daa897c5d51025d6f4f2fdedc1a5e9ab1519d9a8512d5828b39586da5d3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001447550800024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0733-8716
IngestDate Thu Aug 14 02:12:25 EDT 2025
Sat Nov 29 08:06:51 EST 2025
Wed Aug 27 01:45:29 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c318t-e2b5c7daa897c5d51025d6f4f2fdedc1a5e9ab1519d9a8512d5828b39586da5d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-0310-568X
PQID 3178911444
PQPubID 85481
PageCount 15
ParticipantIDs proquest_journals_3178911444
ieee_primary_10845854
crossref_primary_10_1109_JSAC_2025_3531406
PublicationCentury 2000
PublicationDate 2025-04-01
PublicationDateYYYYMMDD 2025-04-01
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-04-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE journal on selected areas in communications
PublicationTitleAbbrev J-SAC
PublicationYear 2025
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
Series (ref46) 2014
ref12
ref34
ref15
ref37
ref14
Bengio (ref35) 2013
(ref43) 2001
ref31
ref10
ref32
Kumar (ref36); 32
ref2
ref1
ref17
ref16
ref19
ref18
Devlin (ref23); 1
Bogdanov (ref38)
(ref3) 1997
Radford (ref28) 2018
Kreuk (ref30)
ref24
ref45
ref26
ref25
ref47
Défossez (ref20) 2023
ref42
ref41
Delétang (ref21)
ref22
ref44
ref27
ref8
ref7
Kingma (ref39) 2014
ref9
(ref11) 2022
Borsos (ref29) 2023
ref4
ref6
ref5
ref40
van den Oord (ref33)
References_xml – ident: ref13
  doi: 10.21437/Interspeech.2022-11439
– ident: ref15
  doi: 10.21437/Interspeech.2019-1255
– start-page: 1
  volume-title: Proc. Int. Conf. Mach. Learn. (ICML)
  ident: ref38
  article-title: The MTG-Jamendo dataset for automatic music tagging
– ident: ref42
  doi: 10.1109/ICASSP.2019.8683855
– ident: ref31
  doi: 10.1109/CVPR52729.2023.01008
– ident: ref47
  doi: 10.1145/3657282
– year: 2014
  ident: ref39
  article-title: Adam: A method for stochastic optimization
  publication-title: arXiv:1412.6980
– ident: ref17
  doi: 10.1109/ICASSP40776.2020.9054347
– volume-title: Adaptive Multi-Rate—Wideband (AMR-WB) Speech Codec; Error Concealment of Erroneous or Lost Frames
  year: 2022
  ident: ref11
– ident: ref26
  doi: 10.1109/JSAIT.2022.3231042
– year: 2013
  ident: ref35
  article-title: Estimating or propagating gradients through stochastic neurons for conditional computation
  publication-title: arXiv:1308.3432
– volume: 32
  start-page: 1
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref36
  article-title: MelGAN: Generative adversarial networks for conditional waveform synthesis
– start-page: 862
  volume-title: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs
  year: 2001
  ident: ref43
– volume-title: Method for The Subjective Assessment of Intermediate Quality Level of Audio Systems
  year: 2014
  ident: ref46
– ident: ref4
  doi: 10.1109/MSP.2010.938080
– ident: ref7
  doi: 10.1109/JSAC.2022.3180802
– ident: ref10
  doi: 10.1109/TCOM.1981.1094975
– ident: ref18
  doi: 10.1109/WCNC55385.2023.10118921
– ident: ref8
  doi: 10.1109/ICASSP49357.2023.10094680
– volume: 1
  start-page: 4171
  volume-title: Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol.
  ident: ref23
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
– ident: ref5
  doi: 10.1109/JSAC.2021.3087240
– ident: ref16
  doi: 10.1109/ICASSP.2018.8462529
– year: 2023
  ident: ref29
  article-title: SoundStorm: Efficient parallel audio generation
  publication-title: arXiv:2305.09636
– start-page: 6309
  volume-title: Proc. 31st Int. Conf. Neural Inf. Process. Syst.
  ident: ref33
  article-title: Neural discrete representation learning
– ident: ref34
  doi: 10.17487/rfc7587
– ident: ref41
  doi: 10.1109/ICASSP.2018.8462116
– ident: ref27
  doi: 10.1109/GLOBECOM48099.2022.10000735
– ident: ref2
  doi: 10.17487/rfc6716
– volume-title: Information Technology-generic Coding of Moving Pictures and Associated Audio Information—Part 7: Advanced Audio Coding (AAC)
  year: 1997
  ident: ref3
– ident: ref24
  doi: 10.1109/TASLP.2023.3277693
– volume-title: Improving Language Understanding By Generative Pre-Training
  year: 2018
  ident: ref28
– ident: ref44
  doi: 10.1109/QoMEX48832.2020.9123150
– start-page: 1
  volume-title: Proc. 12th Int. Conf. Learn. Represent.
  ident: ref21
  article-title: Language modeling is compression
– ident: ref45
  doi: 10.21437/Interspeech.2023-1532
– ident: ref1
  doi: 10.1109/TSA.2002.804299
– start-page: 1
  year: 2023
  ident: ref20
  article-title: High fidelity neural audio compression
  publication-title: Trans. Mach. Learn. Res.
– start-page: 1
  volume-title: Proc. 11th Int. Conf. Learn. Represent.
  ident: ref30
  article-title: AudioGen: Textually guided audio generation
– ident: ref25
  doi: 10.1109/TCOMM.2024.3386577
– ident: ref9
  doi: 10.1109/ICASSP49357.2023.10096528
– ident: ref32
  doi: 10.48550/ARXIV.1706.03762
– ident: ref19
  doi: 10.1109/TASLP.2021.3129994
– ident: ref12
  doi: 10.1109/TASLP.2019.2947232
– ident: ref40
  doi: 10.21437/Interspeech.2004-579
– ident: ref14
  doi: 10.21437/Interspeech.2022-779
– ident: ref6
  doi: 10.1109/JSAC.2022.3221952
– ident: ref37
  doi: 10.1109/ICASSP.2015.7178964
– ident: ref22
  doi: 10.1109/MWC.005.2300574
SSID ssj0014482
Score 2.4790497
Snippet In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding...
In this paper, we propose “SoundSpring”, a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 1308
SubjectTerms Audio coding
audio communication
Audio signals
Codecs
Communication
Communications systems
Compressors
Context modeling
Decoding
error-resilient coding
Learning
masked language models
Neural networks
Packet loss
Propagation losses
Receivers
Resilience
Robustness
Semantic-aware transceiver
Transceivers
Transmission error
Title SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling
URI https://ieeexplore.ieee.org/document/10845854
https://www.proquest.com/docview/3178911444
Volume 43
WOSCitedRecordID wos001447550800024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-0008
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014482
  issn: 0733-8716
  databaseCode: RIE
  dateStart: 19830101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA86POjBz4nTKTl4EjK7Nm0ab2M6ROYQ58fAQ8lXsTg2WVv_fl_STgbiwVsPaQjv5X3-3stD6LxrdCAYi4j0uCC0m6YklsYnvqAqEuDQaldE8zJko1E8mfCHulnd9cIYY1zxmenYT4fl67kqbaoMJDym4N7SdbQO21fNWj-QAcQZDjJgQUBsFFBDmF2PX96Ne30IBf2wE8CVo3a60YoRclNVfqliZ18GO_882S7arh1J3Ks4v4fWzGwfba08L3iA3sZ2ZlKVurvCQzgJeTR5NrUtkLhX6myOnalSxhZn4NeseMfXpZiSARi7KkeI70X-YTQe1mlNbGen2Q72Jnoe3Dz1b0k9TIEoENuCGF-GimkhYs5UqEEU_VBHKU39VButuiI0XEiw_1xzAW6Yry2gJgMexpEWoQ4OUWM2n5kjhGXEtZSBlzJlYdMoBq-CcuapSMkUNGALXSypm3xWb2YkLtbweGJZkVhWJDUrWqhpybmysKJkC7WXDElqscoTcHZi0M6U0uM_fjtBm3b3qramjRrFojSnaEN9FVm-OHM35hsXWr69
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60CurBZ8Vq1Rw8Can7yD7irVRL1W0RW7XgYclrsVha6cPfb5LdSkE8eNvDLhsymZlv5pvJAFy4SvosikLMHcowcbMMx1x52GNEhEwDWmmLaF6SqNOJ-336WDSr214YpZQtPlM182i5fDkWc5Mq0xoeEw1vySqsBYR4Tt6u9UMa6EjDkgaR72MTBxQkpuvQq_tuvaGDQS-o-frQETPfaMkN2bkqv4yx9TDNnX-ubRe2CyiJ6rns92BFjfZha-mCwQN465qpSXny7holeiX4SU0HQ9MEiepzORgj66yEMuUZ6HUwe0c3czbETe3u8iwharPph5IoKRKbyExPMz3sZXhu3vYaLVyMU8BCK-4MK48HIpKMxTQSgdTK6AUyzEjmZVJJ4bJAUcY1AqCSMg3EPGkoNe7TIA4lC6R_CKXReKSOAPGQSs59J4uEIU7DWOMKQiNHhIJn2gZW4HKxu-lnfmtGaqMNh6ZGFKkRRVqIogJls51LL-Y7WYHqQiBpoVjTVMOdWNtnQsjxH5-dw0ar107S5K7zcAKb5k95pU0VSrPJXJ3CuviaDaaTM3t6vgEv5sIE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SoundSpring%3A+Loss-Resilient+Audio+Transceiver+With+Dual-Functional+Masked+Language+Modeling&rft.jtitle=IEEE+journal+on+selected+areas+in+communications&rft.au=Yao%2C+Shengshi&rft.au=Dai%2C+Jincheng&rft.au=Qin%2C+Xiaoqi&rft.au=Wang%2C+Sixian&rft.date=2025-04-01&rft.pub=IEEE&rft.issn=0733-8716&rft.volume=43&rft.issue=4&rft.spage=1308&rft.epage=1322&rft_id=info:doi/10.1109%2FJSAC.2025.3531406&rft.externalDocID=10845854
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0733-8716&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0733-8716&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0733-8716&client=summon