Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Ba...
Saved in:
| Published in: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 711 - 715 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
06.06.2021
|
| Subjects: | |
| ISSN: | 2379-190X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech. |
|---|---|
| AbstractList | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech. |
| Author | Vale, Vinjai Isik, Umut Valin, Jean-Marc Casebeer, Jonah Krishnaswamy, Arvindh Giri, Ritwik |
| Author_xml | – sequence: 1 givenname: Jonah surname: Casebeer fullname: Casebeer, Jonah organization: University of Illinois at Urbana-Champaign – sequence: 2 givenname: Vinjai surname: Vale fullname: Vale, Vinjai organization: Stanford University – sequence: 3 givenname: Umut surname: Isik fullname: Isik, Umut organization: Amazon Web Services – sequence: 4 givenname: Jean-Marc surname: Valin fullname: Valin, Jean-Marc organization: Amazon Web Services – sequence: 5 givenname: Ritwik surname: Giri fullname: Giri, Ritwik organization: Amazon Web Services – sequence: 6 givenname: Arvindh surname: Krishnaswamy fullname: Krishnaswamy, Arvindh organization: Amazon Web Services |
| BookMark | eNotUMlOwzAQNQgk2tIv4OIfSBjbk9rmVlVlkSq2st0qx5kSI3CqxBWCrycVPT29ZUZPb8iOYhOJMS4gFwLs-c1sulzeK6ulySVIkVsUOIHigI2tNqKXhe5pccgGUmmbCQtvJ2zYdR8AYDSaAXudx9pFH-I7DzE1PNXEZ01F_oLfNqEj_tiU2y7x5YbI1ztrF_0OqeYv5FPTZg9bF1P4pYpPt6mh6Pvrtjtlx2v32dF4jyP2fDl_ml1ni7urvvYiCxJUyrwC5xB1NcECUZGRplyjRnAGDAIClEo4v7aF1xWaghTqsgCvrSi11KBG7Oz_byCi1aYNX679We13UH-Bd1Pw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP39728.2021.9414605 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781728176055 1728176050 |
| EISSN | 2379-190X |
| EndPage | 715 |
| ExternalDocumentID | 9414605 |
| Genre | orig-research |
| GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:39:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_9414605 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-June-6 |
| PublicationDateYYYYMMDD | 2021-06-06 |
| PublicationDate_xml | – month: 06 year: 2021 text: 2021-June-6 day: 06 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008748 |
| Score | 2.3270369 |
| Snippet | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 711 |
| SubjectTerms | audio compression Codecs Convolution Convolutional codes Decoding Speech coding Speech enhancement Training |
| Title | Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders |
| URI | https://ieeexplore.ieee.org/document/9414605 |
| WOSCitedRecordID | wos000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG-AeNCLH2D8Tg8eHXRrt27eCIFoYgiKIjeytm-yy0Zg8-Bfb99A1MSLt2Vrs-S9tO_r936PkGsVBYqb2HUUV8YR2AMcu2HimMQAt-FHtJ5FMHmQw2E4nUajGrnZ9sIAQAU-gzY-VrV8k-sSU2WdSLhYxquTupTBuldre-uGUoRfSB0Wde573fF4ZI2th_gtz21v9v4aolLZkMH-__5-QFrfzXh0tDUzh6QG2RHZ-8Ej2CSv_WyOvBnZG02zIqfWqaO93IC-pcM8XQF9ylW5Kuh4AaDn-AmXYgqWTqqsvfNYWgmnH2BotyxyJLdEgHOLvAz6z707ZzMxwUk9xgtHcxbHQkiDPF2Cgw2nVCKkYHHIbCRnz6vibqyTyNfSiNC36pDKZ1pGrrKODOPHpJHlGZwQqnFynxdadzbQgoFUMgEjfW2YsmbfeKekiSKaLdakGLONdM7-fn1OdlELFcYquCCNYlnCJdnR70W6Wl5VmvwEBEGemQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4gmqgXH2B8uwePFrbdLdt6IwQCERsURG6k-6j00hJoPfjr3S2Imnjx1rS7aTKT3Xl98w3ALfcbnMjQtjjh0qKmBzi0vciSkVREhx_-ahbBuM-CwJtM_EEJ7ja9MEqpAnymauaxqOXLVOQmVVb3qW3KeFuw7VLq4FW31ube9Rj1vrA62K_3Ws3hcKDNrWMQXI5dW-_-NUalsCKdg__9_xCq3-14aLAxNEdQUskx7P9gEqzAazuZGeaM5A3FSZYi7dahViqVuEdBGi8Vek55vszQcK6UmJlPZqlJwqJxkbe3nnIt4_hDSdTMs9TQWxqIcxVeOu1Rq2utZyZYsYNJZgmCw5BSJg1TFyVKB1Q8oozi0MM6ltMnlhM7FJHvCiap52qFMO5iwXyba1cGkxMoJ2miTgEJM7vP8bRD2xAUK8ZZpCRzhcRcG37pnEHFiGg6X9FiTNfSOf_79Q3sdkeP_Wm_FzxcwJ7RSIG4alxCOVvk6gp2xHsWLxfXhVY_AXS3oeA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Enhancing+into+the+Codec%3A+Noise+Robust+Speech+Coding+with+Vector-Quantized+Autoencoders&rft.au=Casebeer%2C+Jonah&rft.au=Vale%2C+Vinjai&rft.au=Isik%2C+Umut&rft.au=Valin%2C+Jean-Marc&rft.date=2021-06-06&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=711&rft.epage=715&rft_id=info:doi/10.1109%2FICASSP39728.2021.9414605&rft.externalDocID=9414605 |