SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling

In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE journal on selected areas in communications Ročník 43; číslo 4; s. 1308 - 1322
Hlavní autori:	Yao, Shengshi, Dai, Jincheng, Qin, Xiaoqi, Wang, Sixian, Wang, Siye, Niu, Kai, Zhang, Ping
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York IEEE 01.04.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Audio coding audio communication Audio signals Codecs Communication Communications systems Compressors Context modeling Decoding error-resilient coding Learning masked language models Neural networks Packet loss Propagation losses Receivers Resilience Robustness Semantic-aware transceiver Transceivers Transmission error
ISSN:	0733-8716, 1558-0008
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.
AbstractList	In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers.
Author	Wang, Sixian Yao, Shengshi Niu, Kai Dai, Jincheng Zhang, Ping Qin, Xiaoqi Wang, Siye
Author_xml	– sequence: 1 givenname: Shengshi surname: Yao fullname: Yao, Shengshi organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 2 givenname: Jincheng orcidid: 0000-0002-0310-568X surname: Dai fullname: Dai, Jincheng email: daijincheng@bupt.edu.cn organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 3 givenname: Xiaoqi surname: Qin fullname: Qin, Xiaoqi organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 4 givenname: Sixian surname: Wang fullname: Wang, Sixian organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 5 givenname: Siye surname: Wang fullname: Wang, Siye organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 6 givenname: Kai surname: Niu fullname: Niu, Kai organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 7 givenname: Ping surname: Zhang fullname: Zhang, Ping organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
BookMark	eNpNkD1PwzAURS0EEm3hByAxRGJO8YvjxGGrCuVDqZBoEQND5MYvxSXYxU6Q-Pe4agemt9x79c4ZkmNjDRJyAXQMQIvrp8VkOk5owseMM0hpdkQGwLmIKaXimAxozlgscshOydD7DaWQpiIZkPeF7Y1abJ0265uotN7HL-h1q9F00aRX2kZLJ42vUf-gi9509xHd9rKNZ72pO22NbKO59J-oolKadS_XGM2twjbsnZGTRrYezw93RF5nd8vpQ1w-3z9OJ2VcMxBdjMmK17mSUhR5zRWHQKGyJm2SRqGqQXIs5Ao4FKqQgkOiuEjEihVcZEpyxUbkar-7dfa7R99VG9u78JmvGOSigMCahhTsU7ULlA6bKkB_SfdbAa12Dqudw2rnsDo4DJ3LfUcj4r-8SLngKfsDwoZvrQ
CODEN	ISACEM
Cites_doi	10.21437/Interspeech.2022-11439 10.21437/Interspeech.2019-1255 10.1109/ICASSP.2019.8683855 10.1109/CVPR52729.2023.01008 10.1145/3657282 10.1109/ICASSP40776.2020.9054347 10.1109/JSAIT.2022.3231042 10.1109/MSP.2010.938080 10.1109/JSAC.2022.3180802 10.1109/TCOM.1981.1094975 10.1109/WCNC55385.2023.10118921 10.1109/ICASSP49357.2023.10094680 10.1109/JSAC.2021.3087240 10.1109/ICASSP.2018.8462529 10.17487/rfc7587 10.1109/ICASSP.2018.8462116 10.1109/GLOBECOM48099.2022.10000735 10.17487/rfc6716 10.1109/TASLP.2023.3277693 10.1109/QoMEX48832.2020.9123150 10.21437/Interspeech.2023-1532 10.1109/TSA.2002.804299 10.1109/TCOMM.2024.3386577 10.1109/ICASSP49357.2023.10096528 10.48550/ARXIV.1706.03762 10.1109/TASLP.2021.3129994 10.1109/TASLP.2019.2947232 10.21437/Interspeech.2004-579 10.21437/Interspeech.2022-779 10.1109/JSAC.2022.3221952 10.1109/ICASSP.2015.7178964 10.1109/MWC.005.2300574
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID	97E RIA RIE AAYXX CITATION 7SP 8FD L7M
DOI	10.1109/JSAC.2025.3531406
DatabaseName	IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace
DatabaseTitle	CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-0008
EndPage	1322
ExternalDocumentID	10_1109_JSAC_2025_3531406 10845854
Genre	orig-research
GrantInformation_xml	– fundername: Program for Youth Innovative Research Team of BUPT grantid: 2023YQTD02 – fundername: National Key Research and Development Program of China grantid: 2024YFF0509700 – fundername: Beijing Municipal Natural Science Foundation grantid: L232047; 4222012 – fundername: National Natural Science Foundation of China grantid: 62321001; 62293481; 62371063; 92267301; 62201089 funderid: 10.13039/501100001809
GroupedDBID	-~X .DC 0R~ 29I 3EH 4.4 41~ 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT ADRHT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IBMZZ ICLAB IES IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYXX CITATION 7SP 8FD L7M
ID	FETCH-LOGICAL-c318t-e2b5c7daa897c5d51025d6f4f2fdedc1a5e9ab1519d9a8512d5828b39586da5d3
IEDL.DBID	RIE
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001447550800024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0733-8716
IngestDate	Thu Aug 14 02:12:25 EDT 2025 Sat Nov 29 08:06:51 EST 2025 Wed Aug 27 01:45:29 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c318t-e2b5c7daa897c5d51025d6f4f2fdedc1a5e9ab1519d9a8512d5828b39586da5d3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-0310-568X
PQID	3178911444
PQPubID	85481
PageCount	15
ParticipantIDs	proquest_journals_3178911444 ieee_primary_10845854 crossref_primary_10_1109_JSAC_2025_3531406
PublicationCentury	2000
PublicationDate	2025-04-01
PublicationDateYYYYMMDD	2025-04-01
PublicationDate_xml	– month: 04 year: 2025 text: 2025-04-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE journal on selected areas in communications
PublicationTitleAbbrev	J-SAC
PublicationYear	2025
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 Series (ref46) 2014 ref12 ref34 ref15 ref37 ref14 Bengio (ref35) 2013 (ref43) 2001 ref31 ref10 ref32 Kumar (ref36); 32 ref2 ref1 ref17 ref16 ref19 ref18 Devlin (ref23); 1 Bogdanov (ref38) (ref3) 1997 Radford (ref28) 2018 Kreuk (ref30) ref24 ref45 ref26 ref25 ref47 Défossez (ref20) 2023 ref42 ref41 Delétang (ref21) ref22 ref44 ref27 ref8 ref7 Kingma (ref39) 2014 ref9 (ref11) 2022 Borsos (ref29) 2023 ref4 ref6 ref5 ref40 van den Oord (ref33)
References_xml	– ident: ref13 doi: 10.21437/Interspeech.2022-11439 – ident: ref15 doi: 10.21437/Interspeech.2019-1255 – start-page: 1 volume-title: Proc. Int. Conf. Mach. Learn. (ICML) ident: ref38 article-title: The MTG-Jamendo dataset for automatic music tagging – ident: ref42 doi: 10.1109/ICASSP.2019.8683855 – ident: ref31 doi: 10.1109/CVPR52729.2023.01008 – ident: ref47 doi: 10.1145/3657282 – year: 2014 ident: ref39 article-title: Adam: A method for stochastic optimization publication-title: arXiv:1412.6980 – ident: ref17 doi: 10.1109/ICASSP40776.2020.9054347 – volume-title: Adaptive Multi-Rate—Wideband (AMR-WB) Speech Codec; Error Concealment of Erroneous or Lost Frames year: 2022 ident: ref11 – ident: ref26 doi: 10.1109/JSAIT.2022.3231042 – year: 2013 ident: ref35 article-title: Estimating or propagating gradients through stochastic neurons for conditional computation publication-title: arXiv:1308.3432 – volume: 32 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref36 article-title: MelGAN: Generative adversarial networks for conditional waveform synthesis – start-page: 862 volume-title: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs year: 2001 ident: ref43 – volume-title: Method for The Subjective Assessment of Intermediate Quality Level of Audio Systems year: 2014 ident: ref46 – ident: ref4 doi: 10.1109/MSP.2010.938080 – ident: ref7 doi: 10.1109/JSAC.2022.3180802 – ident: ref10 doi: 10.1109/TCOM.1981.1094975 – ident: ref18 doi: 10.1109/WCNC55385.2023.10118921 – ident: ref8 doi: 10.1109/ICASSP49357.2023.10094680 – volume: 1 start-page: 4171 volume-title: Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol. ident: ref23 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding – ident: ref5 doi: 10.1109/JSAC.2021.3087240 – ident: ref16 doi: 10.1109/ICASSP.2018.8462529 – year: 2023 ident: ref29 article-title: SoundStorm: Efficient parallel audio generation publication-title: arXiv:2305.09636 – start-page: 6309 volume-title: Proc. 31st Int. Conf. Neural Inf. Process. Syst. ident: ref33 article-title: Neural discrete representation learning – ident: ref34 doi: 10.17487/rfc7587 – ident: ref41 doi: 10.1109/ICASSP.2018.8462116 – ident: ref27 doi: 10.1109/GLOBECOM48099.2022.10000735 – ident: ref2 doi: 10.17487/rfc6716 – volume-title: Information Technology-generic Coding of Moving Pictures and Associated Audio Information—Part 7: Advanced Audio Coding (AAC) year: 1997 ident: ref3 – ident: ref24 doi: 10.1109/TASLP.2023.3277693 – volume-title: Improving Language Understanding By Generative Pre-Training year: 2018 ident: ref28 – ident: ref44 doi: 10.1109/QoMEX48832.2020.9123150 – start-page: 1 volume-title: Proc. 12th Int. Conf. Learn. Represent. ident: ref21 article-title: Language modeling is compression – ident: ref45 doi: 10.21437/Interspeech.2023-1532 – ident: ref1 doi: 10.1109/TSA.2002.804299 – start-page: 1 year: 2023 ident: ref20 article-title: High fidelity neural audio compression publication-title: Trans. Mach. Learn. Res. – start-page: 1 volume-title: Proc. 11th Int. Conf. Learn. Represent. ident: ref30 article-title: AudioGen: Textually guided audio generation – ident: ref25 doi: 10.1109/TCOMM.2024.3386577 – ident: ref9 doi: 10.1109/ICASSP49357.2023.10096528 – ident: ref32 doi: 10.48550/ARXIV.1706.03762 – ident: ref19 doi: 10.1109/TASLP.2021.3129994 – ident: ref12 doi: 10.1109/TASLP.2019.2947232 – ident: ref40 doi: 10.21437/Interspeech.2004-579 – ident: ref14 doi: 10.21437/Interspeech.2022-779 – ident: ref6 doi: 10.1109/JSAC.2022.3221952 – ident: ref37 doi: 10.1109/ICASSP.2015.7178964 – ident: ref22 doi: 10.1109/MWC.005.2300574
SSID	ssj0014482
Score	2.4790497
Snippet	In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding... In this paper, we propose “SoundSpring”, a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Index Database Publisher
StartPage	1308
SubjectTerms	Audio coding audio communication Audio signals Codecs Communication Communications systems Compressors Context modeling Decoding error-resilient coding Learning masked language models Neural networks Packet loss Propagation losses Receivers Resilience Robustness Semantic-aware transceiver Transceivers Transmission error
Title	SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling
URI	https://ieeexplore.ieee.org/document/10845854 https://www.proquest.com/docview/3178911444
Volume	43
WOSCitedRecordID	wos001447550800024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-0008 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014482 issn: 0733-8716 databaseCode: RIE dateStart: 19830101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA86POjBz4nTKTl4EjK7Nm0ab2M6ROYQ58fAQ8lXsTg2WVv_fl_STgbiwVsPaQjv5X3-3stD6LxrdCAYi4j0uCC0m6YklsYnvqAqEuDQaldE8zJko1E8mfCHulnd9cIYY1zxmenYT4fl67kqbaoMJDym4N7SdbQO21fNWj-QAcQZDjJgQUBsFFBDmF2PX96Ne30IBf2wE8CVo3a60YoRclNVfqliZ18GO_882S7arh1J3Ks4v4fWzGwfba08L3iA3sZ2ZlKVurvCQzgJeTR5NrUtkLhX6myOnalSxhZn4NeseMfXpZiSARi7KkeI70X-YTQe1mlNbGen2Q72Jnoe3Dz1b0k9TIEoENuCGF-GimkhYs5UqEEU_VBHKU39VButuiI0XEiw_1xzAW6Yry2gJgMexpEWoQ4OUWM2n5kjhGXEtZSBlzJlYdMoBq-CcuapSMkUNGALXSypm3xWb2YkLtbweGJZkVhWJDUrWqhpybmysKJkC7WXDElqscoTcHZi0M6U0uM_fjtBm3b3qramjRrFojSnaEN9FVm-OHM35hsXWr69
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60CurBZ8Vq1Rw8Can7yD7irVRL1W0RW7XgYclrsVha6cPfb5LdSkE8eNvDLhsymZlv5pvJAFy4SvosikLMHcowcbMMx1x52GNEhEwDWmmLaF6SqNOJ-336WDSr214YpZQtPlM182i5fDkWc5Mq0xoeEw1vySqsBYR4Tt6u9UMa6EjDkgaR72MTBxQkpuvQq_tuvaGDQS-o-frQETPfaMkN2bkqv4yx9TDNnX-ubRe2CyiJ6rns92BFjfZha-mCwQN465qpSXny7holeiX4SU0HQ9MEiepzORgj66yEMuUZ6HUwe0c3czbETe3u8iwharPph5IoKRKbyExPMz3sZXhu3vYaLVyMU8BCK-4MK48HIpKMxTQSgdTK6AUyzEjmZVJJ4bJAUcY1AqCSMg3EPGkoNe7TIA4lC6R_CKXReKSOAPGQSs59J4uEIU7DWOMKQiNHhIJn2gZW4HKxu-lnfmtGaqMNh6ZGFKkRRVqIogJls51LL-Y7WYHqQiBpoVjTVMOdWNtnQsjxH5-dw0ar107S5K7zcAKb5k95pU0VSrPJXJ3CuviaDaaTM3t6vgEv5sIE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SoundSpring%3A+Loss-Resilient+Audio+Transceiver+With+Dual-Functional+Masked+Language+Modeling&rft.jtitle=IEEE+journal+on+selected+areas+in+communications&rft.au=Yao%2C+Shengshi&rft.au=Dai%2C+Jincheng&rft.au=Qin%2C+Xiaoqi&rft.au=Wang%2C+Sixian&rft.date=2025-04-01&rft.pub=IEEE&rft.issn=0733-8716&rft.volume=43&rft.issue=4&rft.spage=1308&rft.epage=1322&rft_id=info:doi/10.1109%2FJSAC.2025.3531406&rft.externalDocID=10845854
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0733-8716&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0733-8716&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0733-8716&client=summon