SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling
In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to...
Uložené v:
| Vydané v: | IEEE journal on selected areas in communications Ročník 43; číslo 4; s. 1308 - 1322 |
|---|---|
| Hlavní autori: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
IEEE
01.04.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 0733-8716, 1558-0008 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers. |
|---|---|
| AbstractList | In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards both audio compression efficiency and transmission error resiliency, we show that mask-learned language models are indeed powerful contextual predictors, and our dual-functional compression and concealment framework offers fresh perspectives on the application of foundation language models in audio communication. Through extensive experimental evaluations, we establish that SoundSpring apparently outperforms contemporary audio transmission systems in terms of signal fidelity metrics and perceptual quality scores. These new findings not only advocate for the practical deployment of SoundSpring in learning-based audio communication systems but also inspire the development of future audio semantic transceivers. |
| Author | Wang, Sixian Yao, Shengshi Niu, Kai Dai, Jincheng Zhang, Ping Qin, Xiaoqi Wang, Siye |
| Author_xml | – sequence: 1 givenname: Shengshi surname: Yao fullname: Yao, Shengshi organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 2 givenname: Jincheng orcidid: 0000-0002-0310-568X surname: Dai fullname: Dai, Jincheng email: daijincheng@bupt.edu.cn organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 3 givenname: Xiaoqi surname: Qin fullname: Qin, Xiaoqi organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 4 givenname: Sixian surname: Wang fullname: Wang, Sixian organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 5 givenname: Siye surname: Wang fullname: Wang, Siye organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 6 givenname: Kai surname: Niu fullname: Niu, Kai organization: Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 7 givenname: Ping surname: Zhang fullname: Zhang, Ping organization: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China |
| BookMark | eNpNkD1PwzAURS0EEm3hByAxRGJO8YvjxGGrCuVDqZBoEQND5MYvxSXYxU6Q-Pe4agemt9x79c4ZkmNjDRJyAXQMQIvrp8VkOk5owseMM0hpdkQGwLmIKaXimAxozlgscshOydD7DaWQpiIZkPeF7Y1abJ0265uotN7HL-h1q9F00aRX2kZLJ42vUf-gi9509xHd9rKNZ72pO22NbKO59J-oolKadS_XGM2twjbsnZGTRrYezw93RF5nd8vpQ1w-3z9OJ2VcMxBdjMmK17mSUhR5zRWHQKGyJm2SRqGqQXIs5Ao4FKqQgkOiuEjEihVcZEpyxUbkar-7dfa7R99VG9u78JmvGOSigMCahhTsU7ULlA6bKkB_SfdbAa12Dqudw2rnsDo4DJ3LfUcj4r-8SLngKfsDwoZvrQ |
| CODEN | ISACEM |
| Cites_doi | 10.21437/Interspeech.2022-11439 10.21437/Interspeech.2019-1255 10.1109/ICASSP.2019.8683855 10.1109/CVPR52729.2023.01008 10.1145/3657282 10.1109/ICASSP40776.2020.9054347 10.1109/JSAIT.2022.3231042 10.1109/MSP.2010.938080 10.1109/JSAC.2022.3180802 10.1109/TCOM.1981.1094975 10.1109/WCNC55385.2023.10118921 10.1109/ICASSP49357.2023.10094680 10.1109/JSAC.2021.3087240 10.1109/ICASSP.2018.8462529 10.17487/rfc7587 10.1109/ICASSP.2018.8462116 10.1109/GLOBECOM48099.2022.10000735 10.17487/rfc6716 10.1109/TASLP.2023.3277693 10.1109/QoMEX48832.2020.9123150 10.21437/Interspeech.2023-1532 10.1109/TSA.2002.804299 10.1109/TCOMM.2024.3386577 10.1109/ICASSP49357.2023.10096528 10.48550/ARXIV.1706.03762 10.1109/TASLP.2021.3129994 10.1109/TASLP.2019.2947232 10.21437/Interspeech.2004-579 10.21437/Interspeech.2022-779 10.1109/JSAC.2022.3221952 10.1109/ICASSP.2015.7178964 10.1109/MWC.005.2300574 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
| DOI | 10.1109/JSAC.2025.3531406 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
| DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-0008 |
| EndPage | 1322 |
| ExternalDocumentID | 10_1109_JSAC_2025_3531406 10845854 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Program for Youth Innovative Research Team of BUPT grantid: 2023YQTD02 – fundername: National Key Research and Development Program of China grantid: 2024YFF0509700 – fundername: Beijing Municipal Natural Science Foundation grantid: L232047; 4222012 – fundername: National Natural Science Foundation of China grantid: 62321001; 62293481; 62371063; 92267301; 62201089 funderid: 10.13039/501100001809 |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 41~ 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT ADRHT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IBMZZ ICLAB IES IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYXX CITATION 7SP 8FD L7M |
| ID | FETCH-LOGICAL-c318t-e2b5c7daa897c5d51025d6f4f2fdedc1a5e9ab1519d9a8512d5828b39586da5d3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001447550800024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0733-8716 |
| IngestDate | Thu Aug 14 02:12:25 EDT 2025 Sat Nov 29 08:06:51 EST 2025 Wed Aug 27 01:45:29 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c318t-e2b5c7daa897c5d51025d6f4f2fdedc1a5e9ab1519d9a8512d5828b39586da5d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-0310-568X |
| PQID | 3178911444 |
| PQPubID | 85481 |
| PageCount | 15 |
| ParticipantIDs | proquest_journals_3178911444 ieee_primary_10845854 crossref_primary_10_1109_JSAC_2025_3531406 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-04-01 |
| PublicationDateYYYYMMDD | 2025-04-01 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE journal on selected areas in communications |
| PublicationTitleAbbrev | J-SAC |
| PublicationYear | 2025 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 Series (ref46) 2014 ref12 ref34 ref15 ref37 ref14 Bengio (ref35) 2013 (ref43) 2001 ref31 ref10 ref32 Kumar (ref36); 32 ref2 ref1 ref17 ref16 ref19 ref18 Devlin (ref23); 1 Bogdanov (ref38) (ref3) 1997 Radford (ref28) 2018 Kreuk (ref30) ref24 ref45 ref26 ref25 ref47 Défossez (ref20) 2023 ref42 ref41 Delétang (ref21) ref22 ref44 ref27 ref8 ref7 Kingma (ref39) 2014 ref9 (ref11) 2022 Borsos (ref29) 2023 ref4 ref6 ref5 ref40 van den Oord (ref33) |
| References_xml | – ident: ref13 doi: 10.21437/Interspeech.2022-11439 – ident: ref15 doi: 10.21437/Interspeech.2019-1255 – start-page: 1 volume-title: Proc. Int. Conf. Mach. Learn. (ICML) ident: ref38 article-title: The MTG-Jamendo dataset for automatic music tagging – ident: ref42 doi: 10.1109/ICASSP.2019.8683855 – ident: ref31 doi: 10.1109/CVPR52729.2023.01008 – ident: ref47 doi: 10.1145/3657282 – year: 2014 ident: ref39 article-title: Adam: A method for stochastic optimization publication-title: arXiv:1412.6980 – ident: ref17 doi: 10.1109/ICASSP40776.2020.9054347 – volume-title: Adaptive Multi-Rate—Wideband (AMR-WB) Speech Codec; Error Concealment of Erroneous or Lost Frames year: 2022 ident: ref11 – ident: ref26 doi: 10.1109/JSAIT.2022.3231042 – year: 2013 ident: ref35 article-title: Estimating or propagating gradients through stochastic neurons for conditional computation publication-title: arXiv:1308.3432 – volume: 32 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref36 article-title: MelGAN: Generative adversarial networks for conditional waveform synthesis – start-page: 862 volume-title: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs year: 2001 ident: ref43 – volume-title: Method for The Subjective Assessment of Intermediate Quality Level of Audio Systems year: 2014 ident: ref46 – ident: ref4 doi: 10.1109/MSP.2010.938080 – ident: ref7 doi: 10.1109/JSAC.2022.3180802 – ident: ref10 doi: 10.1109/TCOM.1981.1094975 – ident: ref18 doi: 10.1109/WCNC55385.2023.10118921 – ident: ref8 doi: 10.1109/ICASSP49357.2023.10094680 – volume: 1 start-page: 4171 volume-title: Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol. ident: ref23 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding – ident: ref5 doi: 10.1109/JSAC.2021.3087240 – ident: ref16 doi: 10.1109/ICASSP.2018.8462529 – year: 2023 ident: ref29 article-title: SoundStorm: Efficient parallel audio generation publication-title: arXiv:2305.09636 – start-page: 6309 volume-title: Proc. 31st Int. Conf. Neural Inf. Process. Syst. ident: ref33 article-title: Neural discrete representation learning – ident: ref34 doi: 10.17487/rfc7587 – ident: ref41 doi: 10.1109/ICASSP.2018.8462116 – ident: ref27 doi: 10.1109/GLOBECOM48099.2022.10000735 – ident: ref2 doi: 10.17487/rfc6716 – volume-title: Information Technology-generic Coding of Moving Pictures and Associated Audio Information—Part 7: Advanced Audio Coding (AAC) year: 1997 ident: ref3 – ident: ref24 doi: 10.1109/TASLP.2023.3277693 – volume-title: Improving Language Understanding By Generative Pre-Training year: 2018 ident: ref28 – ident: ref44 doi: 10.1109/QoMEX48832.2020.9123150 – start-page: 1 volume-title: Proc. 12th Int. Conf. Learn. Represent. ident: ref21 article-title: Language modeling is compression – ident: ref45 doi: 10.21437/Interspeech.2023-1532 – ident: ref1 doi: 10.1109/TSA.2002.804299 – start-page: 1 year: 2023 ident: ref20 article-title: High fidelity neural audio compression publication-title: Trans. Mach. Learn. Res. – start-page: 1 volume-title: Proc. 11th Int. Conf. Learn. Represent. ident: ref30 article-title: AudioGen: Textually guided audio generation – ident: ref25 doi: 10.1109/TCOMM.2024.3386577 – ident: ref9 doi: 10.1109/ICASSP49357.2023.10096528 – ident: ref32 doi: 10.48550/ARXIV.1706.03762 – ident: ref19 doi: 10.1109/TASLP.2021.3129994 – ident: ref12 doi: 10.1109/TASLP.2019.2947232 – ident: ref40 doi: 10.21437/Interspeech.2004-579 – ident: ref14 doi: 10.21437/Interspeech.2022-779 – ident: ref6 doi: 10.1109/JSAC.2022.3221952 – ident: ref37 doi: 10.1109/ICASSP.2015.7178964 – ident: ref22 doi: 10.1109/MWC.005.2300574 |
| SSID | ssj0014482 |
| Score | 2.4790497 |
| Snippet | In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding... In this paper, we propose “SoundSpring”, a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 1308 |
| SubjectTerms | Audio coding audio communication Audio signals Codecs Communication Communications systems Compressors Context modeling Decoding error-resilient coding Learning masked language models Neural networks Packet loss Propagation losses Receivers Resilience Robustness Semantic-aware transceiver Transceivers Transmission error |
| Title | SoundSpring: Loss-Resilient Audio Transceiver With Dual-Functional Masked Language Modeling |
| URI | https://ieeexplore.ieee.org/document/10845854 https://www.proquest.com/docview/3178911444 |
| Volume | 43 |
| WOSCitedRecordID | wos001447550800024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-0008 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014482 issn: 0733-8716 databaseCode: RIE dateStart: 19830101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA86POjBz4nTKTl4EjK7Nm0ab2M6ROYQ58fAQ8lXsTg2WVv_fl_STgbiwVsPaQjv5X3-3stD6LxrdCAYi4j0uCC0m6YklsYnvqAqEuDQaldE8zJko1E8mfCHulnd9cIYY1zxmenYT4fl67kqbaoMJDym4N7SdbQO21fNWj-QAcQZDjJgQUBsFFBDmF2PX96Ne30IBf2wE8CVo3a60YoRclNVfqliZ18GO_882S7arh1J3Ks4v4fWzGwfba08L3iA3sZ2ZlKVurvCQzgJeTR5NrUtkLhX6myOnalSxhZn4NeseMfXpZiSARi7KkeI70X-YTQe1mlNbGen2Q72Jnoe3Dz1b0k9TIEoENuCGF-GimkhYs5UqEEU_VBHKU39VButuiI0XEiw_1xzAW6Yry2gJgMexpEWoQ4OUWM2n5kjhGXEtZSBlzJlYdMoBq-CcuapSMkUNGALXSypm3xWb2YkLtbweGJZkVhWJDUrWqhpybmysKJkC7WXDElqscoTcHZi0M6U0uM_fjtBm3b3qramjRrFojSnaEN9FVm-OHM35hsXWr69 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60CurBZ8Vq1Rw8Can7yD7irVRL1W0RW7XgYclrsVha6cPfb5LdSkE8eNvDLhsymZlv5pvJAFy4SvosikLMHcowcbMMx1x52GNEhEwDWmmLaF6SqNOJ-336WDSr214YpZQtPlM182i5fDkWc5Mq0xoeEw1vySqsBYR4Tt6u9UMa6EjDkgaR72MTBxQkpuvQq_tuvaGDQS-o-frQETPfaMkN2bkqv4yx9TDNnX-ubRe2CyiJ6rns92BFjfZha-mCwQN465qpSXny7holeiX4SU0HQ9MEiepzORgj66yEMuUZ6HUwe0c3czbETe3u8iwharPph5IoKRKbyExPMz3sZXhu3vYaLVyMU8BCK-4MK48HIpKMxTQSgdTK6AUyzEjmZVJJ4bJAUcY1AqCSMg3EPGkoNe7TIA4lC6R_CKXReKSOAPGQSs59J4uEIU7DWOMKQiNHhIJn2gZW4HKxu-lnfmtGaqMNh6ZGFKkRRVqIogJls51LL-Y7WYHqQiBpoVjTVMOdWNtnQsjxH5-dw0ar107S5K7zcAKb5k95pU0VSrPJXJ3CuviaDaaTM3t6vgEv5sIE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SoundSpring%3A+Loss-Resilient+Audio+Transceiver+With+Dual-Functional+Masked+Language+Modeling&rft.jtitle=IEEE+journal+on+selected+areas+in+communications&rft.au=Yao%2C+Shengshi&rft.au=Dai%2C+Jincheng&rft.au=Qin%2C+Xiaoqi&rft.au=Wang%2C+Sixian&rft.date=2025-04-01&rft.pub=IEEE&rft.issn=0733-8716&rft.volume=43&rft.issue=4&rft.spage=1308&rft.epage=1322&rft_id=info:doi/10.1109%2FJSAC.2025.3531406&rft.externalDocID=10845854 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0733-8716&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0733-8716&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0733-8716&client=summon |