Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Ba...
Gespeichert in:
| Veröffentlicht in: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) S. 711 - 715 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
06.06.2021
|
| Schlagworte: | |
| ISSN: | 2379-190X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech. |
|---|---|
| AbstractList | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech. |
| Author | Vale, Vinjai Isik, Umut Valin, Jean-Marc Casebeer, Jonah Krishnaswamy, Arvindh Giri, Ritwik |
| Author_xml | – sequence: 1 givenname: Jonah surname: Casebeer fullname: Casebeer, Jonah organization: University of Illinois at Urbana-Champaign – sequence: 2 givenname: Vinjai surname: Vale fullname: Vale, Vinjai organization: Stanford University – sequence: 3 givenname: Umut surname: Isik fullname: Isik, Umut organization: Amazon Web Services – sequence: 4 givenname: Jean-Marc surname: Valin fullname: Valin, Jean-Marc organization: Amazon Web Services – sequence: 5 givenname: Ritwik surname: Giri fullname: Giri, Ritwik organization: Amazon Web Services – sequence: 6 givenname: Arvindh surname: Krishnaswamy fullname: Krishnaswamy, Arvindh organization: Amazon Web Services |
| BookMark | eNotUMlOwzAQNQgk2tIv4OIfSBjbk9rmVlVlkSq2st0qx5kSI3CqxBWCrycVPT29ZUZPb8iOYhOJMS4gFwLs-c1sulzeK6ulySVIkVsUOIHigI2tNqKXhe5pccgGUmmbCQtvJ2zYdR8AYDSaAXudx9pFH-I7DzE1PNXEZ01F_oLfNqEj_tiU2y7x5YbI1ztrF_0OqeYv5FPTZg9bF1P4pYpPt6mh6Pvrtjtlx2v32dF4jyP2fDl_ml1ni7urvvYiCxJUyrwC5xB1NcECUZGRplyjRnAGDAIClEo4v7aF1xWaghTqsgCvrSi11KBG7Oz_byCi1aYNX679We13UH-Bd1Pw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP39728.2021.9414605 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781728176055 1728176050 |
| EISSN | 2379-190X |
| EndPage | 715 |
| ExternalDocumentID | 9414605 |
| Genre | orig-research |
| GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:39:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_9414605 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-June-6 |
| PublicationDateYYYYMMDD | 2021-06-06 |
| PublicationDate_xml | – month: 06 year: 2021 text: 2021-June-6 day: 06 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008748 |
| Score | 2.3270369 |
| Snippet | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 711 |
| SubjectTerms | audio compression Codecs Convolution Convolutional codes Decoding Speech coding Speech enhancement Training |
| Title | Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders |
| URI | https://ieeexplore.ieee.org/document/9414605 |
| WOSCitedRecordID | wos000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07b8IwELYAdWiXPqDqWx46NhDHDo67IQRqF0RLH2zIr5QsMYKkQ399fYHSVurSLXJsWTrLvs_n775D6LprdOoxmwxUSiMI3YhAxVoEkhDt4YMGDFIVm-CjUTKdinEN3WxzYay1FfnMtuGzess3TpcQKusIRuAZr47qnHfXuVrbUzfhLPli6oSic9_vTSZj72wj4G9FpL0Z-6uISuVDhvv_m_0Atb6T8fB462YOUc3mR2jvh45gE70O8jnoZuRvOMsLhz2ow31nrL7FI5etLH50qlwVeLKwVs_hF3SFECx-qaL2wUPpLZx9WIN7ZeFA3BIIzi30PBw89e-CTcWEIItCWgSahlIyxg3odDFq_XVKpYyzUCahv8n5_aookToVseaGJbGljKs41FwQ5YFMSI9RI3e5PUFYaE5MnBCpOGVRKpRRikqhmIk1i3h6ippgotliLYox21jn7O_mc7QLq1BxrLoXqFEsS3uJdvR7ka2WV9VKfgJsop6n |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8QTdSLH2D8tgePDtq1o6s3QjAQkaCgciP9muyyEdg8-NfbDkRNvHhbujZNXtO-X19_7_cAuG5oFVnMJjwZEd-FbrgnA8U9gbGy8EE5DFIUm2D9fjge80EJ3KxzYYwxBfnM1Nxn8ZavU5W7UFmdU-ye8TbAZkCpj5bZWutzN2Q0_OLqIF7vtprD4cC6W98xuHxcW43-VUal8CJ3e_-bfx9Uv9Px4GDtaA5AySSHYPeHkmAFvLaTqVPOSN5gnGQptLAOtlJt1C3sp_HCwKdU5osMDmfGqKn75bq6ICx8KeL23mNubRx_GA2beZY6eUtHca6C57v2qNXxVjUTvNhHJPMUQUJQyrRT6qLE2AuVjCijSITI3uXsjpUECxXxQDFNw8AQymSAFONYWiiDyBEoJ2lijgHkimEdhFhIRqgfcamlJIJLqgNFfRadgIoz0WS2lMWYrKxz-nfzFdjujB56k163f38GdtyKFIyrxjkoZ_PcXIAt9Z7Fi_llsaqf3Qyh7g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Enhancing+into+the+Codec%3A+Noise+Robust+Speech+Coding+with+Vector-Quantized+Autoencoders&rft.au=Casebeer%2C+Jonah&rft.au=Vale%2C+Vinjai&rft.au=Isik%2C+Umut&rft.au=Valin%2C+Jean-Marc&rft.date=2021-06-06&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=711&rft.epage=715&rft_id=info:doi/10.1109%2FICASSP39728.2021.9414605&rft.externalDocID=9414605 |