Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Ba...
Uloženo v:
| Vydáno v: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 711 - 715 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
06.06.2021
|
| Témata: | |
| ISSN: | 2379-190X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech. |
|---|---|
| AbstractList | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech. |
| Author | Vale, Vinjai Isik, Umut Valin, Jean-Marc Casebeer, Jonah Krishnaswamy, Arvindh Giri, Ritwik |
| Author_xml | – sequence: 1 givenname: Jonah surname: Casebeer fullname: Casebeer, Jonah organization: University of Illinois at Urbana-Champaign – sequence: 2 givenname: Vinjai surname: Vale fullname: Vale, Vinjai organization: Stanford University – sequence: 3 givenname: Umut surname: Isik fullname: Isik, Umut organization: Amazon Web Services – sequence: 4 givenname: Jean-Marc surname: Valin fullname: Valin, Jean-Marc organization: Amazon Web Services – sequence: 5 givenname: Ritwik surname: Giri fullname: Giri, Ritwik organization: Amazon Web Services – sequence: 6 givenname: Arvindh surname: Krishnaswamy fullname: Krishnaswamy, Arvindh organization: Amazon Web Services |
| BookMark | eNotUMlOwzAQNQgk2tIv4OIfSBjbk9rmVlVlkSq2st0qx5kSI3CqxBWCrycVPT29ZUZPb8iOYhOJMS4gFwLs-c1sulzeK6ulySVIkVsUOIHigI2tNqKXhe5pccgGUmmbCQtvJ2zYdR8AYDSaAXudx9pFH-I7DzE1PNXEZ01F_oLfNqEj_tiU2y7x5YbI1ztrF_0OqeYv5FPTZg9bF1P4pYpPt6mh6Pvrtjtlx2v32dF4jyP2fDl_ml1ni7urvvYiCxJUyrwC5xB1NcECUZGRplyjRnAGDAIClEo4v7aF1xWaghTqsgCvrSi11KBG7Oz_byCi1aYNX679We13UH-Bd1Pw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP39728.2021.9414605 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781728176055 1728176050 |
| EISSN | 2379-190X |
| EndPage | 715 |
| ExternalDocumentID | 9414605 |
| Genre | orig-research |
| GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:39:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_9414605 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-June-6 |
| PublicationDateYYYYMMDD | 2021-06-06 |
| PublicationDate_xml | – month: 06 year: 2021 text: 2021-June-6 day: 06 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008748 |
| Score | 2.3270369 |
| Snippet | Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 711 |
| SubjectTerms | audio compression Codecs Convolution Convolutional codes Decoding Speech coding Speech enhancement Training |
| Title | Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders |
| URI | https://ieeexplore.ieee.org/document/9414605 |
| WOSCitedRecordID | wos000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG-AeNCLH2DEr_Tg0UK3va2rN0IgmhiC4gc3srVvsstGYPPgX287EDXx4q1p1y15b-2vff29Xwm5ikEqVyaKGahKGPiCs9jnEUPzA2ltxYdUJeJ6L0ajcDqV4xq53ubCIGJFPsOOLVZn-TpXpQ2VdSU49hivTupCBOtcre2sGwoIv5g6XHbv-r3JZGzA1rX8LdfpbPr-ukSlwpDh_v--fkBa38l4dLyFmUNSw-yI7P3QEWyS10E2t7oZ2RtNsyKnZlFH-7lGdUNHebpC-pjH5aqgkwWimtsm-6gNwdKXKmrPHkpj4fQDNe2VRW7FLS3BuUWeh4On_i3b3JjAUpd7BVMejyIAoa1OF3hotlNxAgJ4FHKzkzPjNfacSCXSV0JD6KMHwrhGCenEZiHDvWPSyPIMTwh1A-kKg908UQq01LELQQSJZ17H0Q-gTZrWRLPFWhRjtrHO6d_VZ2TXeqHiWAXnpFEsS7wgO-q9SFfLy8qTnwOund0 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4QTdSLP8D42x48Wui2t3X1RggGIi4oqNzI1nayy0Zg8-BfbzsQNfHirWnXLXlv7de-fu8rQtcRcGHzWBANVTEBl1ESuTQkSv9AUhrxIVGKuPZZEPjjMR9U0M06F0YpVZLPVMMUy7N8mYnChMqaHCxzjLeBNl0Amy6ztdbzrs_A_-LqUN7stVvD4UDDrW0YXLbVWPX-dY1KiSJ3e__7_j6qf6fj4cEaaA5QRaWHaPeHkmANvXbSqVHOSN9wkuYZ1ss63M6kErc4yJKFwk9ZVCxyPJwpJaamyTxqgrD4pYzbk8dC2zj5UBK3ijwz8paG4lxHz3edUbtLVncmkMSmTk6EQ8MQgEmj1AWO0huqKAYGNPSp3svpERs5Vihi7gomwXeVA0w7RzBuRXopQ50jVE2zVB0jbHvcZhq9aSwESC4jG7wQYke_jirXgxNUMyaazJayGJOVdU7_rr5C293RQ3_S7wX3Z2jHeKRkXHnnqJrPC3WBtsR7nizml6VXPwFqhqEk |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Enhancing+into+the+Codec%3A+Noise+Robust+Speech+Coding+with+Vector-Quantized+Autoencoders&rft.au=Casebeer%2C+Jonah&rft.au=Vale%2C+Vinjai&rft.au=Isik%2C+Umut&rft.au=Valin%2C+Jean-Marc&rft.date=2021-06-06&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=711&rft.epage=715&rft_id=info:doi/10.1109%2FICASSP39728.2021.9414605&rft.externalDocID=9414605 |