Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Ba...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 711 - 715
Main Authors:	Casebeer, Jonah, Vale, Vinjai, Isik, Umut, Valin, Jean-Marc, Giri, Ritwik, Krishnaswamy, Arvindh
Format:	Conference Proceeding
Language:	English
Published:	IEEE 06.06.2021
Subjects:	audio compression Codecs Convolution Convolutional codes Decoding Speech coding Speech enhancement Training
ISSN:	2379-190X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech.
AbstractList	Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech.
Author	Vale, Vinjai Isik, Umut Valin, Jean-Marc Casebeer, Jonah Krishnaswamy, Arvindh Giri, Ritwik
Author_xml	– sequence: 1 givenname: Jonah surname: Casebeer fullname: Casebeer, Jonah organization: University of Illinois at Urbana-Champaign – sequence: 2 givenname: Vinjai surname: Vale fullname: Vale, Vinjai organization: Stanford University – sequence: 3 givenname: Umut surname: Isik fullname: Isik, Umut organization: Amazon Web Services – sequence: 4 givenname: Jean-Marc surname: Valin fullname: Valin, Jean-Marc organization: Amazon Web Services – sequence: 5 givenname: Ritwik surname: Giri fullname: Giri, Ritwik organization: Amazon Web Services – sequence: 6 givenname: Arvindh surname: Krishnaswamy fullname: Krishnaswamy, Arvindh organization: Amazon Web Services
BookMark	eNotUMlOwzAQNQgk2tIv4OIfSBjbk9rmVlVlkSq2st0qx5kSI3CqxBWCrycVPT29ZUZPb8iOYhOJMS4gFwLs-c1sulzeK6ulySVIkVsUOIHigI2tNqKXhe5pccgGUmmbCQtvJ2zYdR8AYDSaAXudx9pFH-I7DzE1PNXEZ01F_oLfNqEj_tiU2y7x5YbI1ztrF_0OqeYv5FPTZg9bF1P4pYpPt6mh6Pvrtjtlx2v32dF4jyP2fDl_ml1ni7urvvYiCxJUyrwC5xB1NcECUZGRplyjRnAGDAIClEo4v7aF1xWaghTqsgCvrSi11KBG7Oz_byCi1aYNX679We13UH-Bd1Pw
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP39728.2021.9414605
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	9781728176055 1728176050
EISSN	2379-190X
EndPage	715
ExternalDocumentID	9414605
Genre	orig-research
GroupedDBID	23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703
IEDL.DBID	RIE
ISICitedReferencesCount	13
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:39:02 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703
PageCount	5
ParticipantIDs	ieee_primary_9414605
PublicationCentury	2000
PublicationDate	2021-June-6
PublicationDateYYYYMMDD	2021-06-06
PublicationDate_xml	– month: 06 year: 2021 text: 2021-June-6 day: 06
PublicationDecade	2020
PublicationTitle	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev	ICASSP
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.3270369
Snippet	Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable...
SourceID	ieee
SourceType	Publisher
StartPage	711
SubjectTerms	audio compression Codecs Convolution Convolutional codes Decoding Speech coding Speech enhancement Training
Title	Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
URI	https://ieeexplore.ieee.org/document/9414605
WOSCitedRecordID	wos000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG-AeNCLH2D8Tg8eHXRrt27eCIFoYgiKIjeytm-yy0Zg8-Bfb99A1MSLt2Vrs-S9tO_r936PkGsVBYqb2HUUV8YR2AMcu2HimMQAt-FHtJ5FMHmQw2E4nUajGrnZ9sIAQAU-gzY-VrV8k-sSU2WdSLhYxquTupTBuldre-uGUoRfSB0Wde573fF4ZI2th_gtz21v9v4aolLZkMH-__5-QFrfzXh0tDUzh6QG2RHZ-8Ej2CSv_WyOvBnZG02zIqfWqaO93IC-pcM8XQF9ylW5Kuh4AaDn-AmXYgqWTqqsvfNYWgmnH2BotyxyJLdEgHOLvAz6z707ZzMxwUk9xgtHcxbHQkiDPF2Cgw2nVCKkYHHIbCRnz6vibqyTyNfSiNC36pDKZ1pGrrKODOPHpJHlGZwQqnFynxdadzbQgoFUMgEjfW2YsmbfeKekiSKaLdakGLONdM7-fn1OdlELFcYquCCNYlnCJdnR70W6Wl5VmvwEBEGemQ
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4gmqgXH2B8uwePFrbdLdt6IwQCERsURG6k-6j00hJoPfjr3S2Imnjx1rS7aTKT3Xl98w3ALfcbnMjQtjjh0qKmBzi0vciSkVREhx_-ahbBuM-CwJtM_EEJ7ja9MEqpAnymauaxqOXLVOQmVVb3qW3KeFuw7VLq4FW31ube9Rj1vrA62K_3Ws3hcKDNrWMQXI5dW-_-NUalsCKdg__9_xCq3-14aLAxNEdQUskx7P9gEqzAazuZGeaM5A3FSZYi7dahViqVuEdBGi8Vek55vszQcK6UmJlPZqlJwqJxkbe3nnIt4_hDSdTMs9TQWxqIcxVeOu1Rq2utZyZYsYNJZgmCw5BSJg1TFyVKB1Q8oozi0MM6ltMnlhM7FJHvCiap52qFMO5iwXyba1cGkxMoJ2miTgEJM7vP8bRD2xAUK8ZZpCRzhcRcG37pnEHFiGg6X9FiTNfSOf_79Q3sdkeP_Wm_FzxcwJ7RSIG4alxCOVvk6gp2xHsWLxfXhVY_AXS3oeA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Enhancing+into+the+Codec%3A+Noise+Robust+Speech+Coding+with+Vector-Quantized+Autoencoders&rft.au=Casebeer%2C+Jonah&rft.au=Vale%2C+Vinjai&rft.au=Isik%2C+Umut&rft.au=Valin%2C+Jean-Marc&rft.date=2021-06-06&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=711&rft.epage=715&rft_id=info:doi/10.1109%2FICASSP39728.2021.9414605&rft.externalDocID=9414605