Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Ba...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) S. 711 - 715
Hauptverfasser:	Casebeer, Jonah, Vale, Vinjai, Isik, Umut, Valin, Jean-Marc, Giri, Ritwik, Krishnaswamy, Arvindh
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 06.06.2021
Schlagworte:	audio compression Codecs Convolution Convolutional codes Decoding Speech coding Speech enhancement Training
ISSN:	2379-190X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech.
AbstractList	Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech.
Author	Vale, Vinjai Isik, Umut Valin, Jean-Marc Casebeer, Jonah Krishnaswamy, Arvindh Giri, Ritwik
Author_xml	– sequence: 1 givenname: Jonah surname: Casebeer fullname: Casebeer, Jonah organization: University of Illinois at Urbana-Champaign – sequence: 2 givenname: Vinjai surname: Vale fullname: Vale, Vinjai organization: Stanford University – sequence: 3 givenname: Umut surname: Isik fullname: Isik, Umut organization: Amazon Web Services – sequence: 4 givenname: Jean-Marc surname: Valin fullname: Valin, Jean-Marc organization: Amazon Web Services – sequence: 5 givenname: Ritwik surname: Giri fullname: Giri, Ritwik organization: Amazon Web Services – sequence: 6 givenname: Arvindh surname: Krishnaswamy fullname: Krishnaswamy, Arvindh organization: Amazon Web Services
BookMark	eNotUMlOwzAQNQgk2tIv4OIfSBjbk9rmVlVlkSq2st0qx5kSI3CqxBWCrycVPT29ZUZPb8iOYhOJMS4gFwLs-c1sulzeK6ulySVIkVsUOIHigI2tNqKXhe5pccgGUmmbCQtvJ2zYdR8AYDSaAXudx9pFH-I7DzE1PNXEZ01F_oLfNqEj_tiU2y7x5YbI1ztrF_0OqeYv5FPTZg9bF1P4pYpPt6mh6Pvrtjtlx2v32dF4jyP2fDl_ml1ni7urvvYiCxJUyrwC5xB1NcECUZGRplyjRnAGDAIClEo4v7aF1xWaghTqsgCvrSi11KBG7Oz_byCi1aYNX679We13UH-Bd1Pw
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP39728.2021.9414605
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	9781728176055 1728176050
EISSN	2379-190X
EndPage	715
ExternalDocumentID	9414605
Genre	orig-research
GroupedDBID	23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703
IEDL.DBID	RIE
ISICitedReferencesCount	13
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:39:02 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703
PageCount	5
ParticipantIDs	ieee_primary_9414605
PublicationCentury	2000
PublicationDate	2021-June-6
PublicationDateYYYYMMDD	2021-06-06
PublicationDate_xml	– month: 06 year: 2021 text: 2021-June-6 day: 06
PublicationDecade	2020
PublicationTitle	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev	ICASSP
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.3270369
Snippet	Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable...
SourceID	ieee
SourceType	Publisher
StartPage	711
SubjectTerms	audio compression Codecs Convolution Convolutional codes Decoding Speech coding Speech enhancement Training
Title	Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
URI	https://ieeexplore.ieee.org/document/9414605
WOSCitedRecordID	wos000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07b8IwELYAdWiXPqDqWx46NhDHDo67IQRqF0RLH2zIr5QsMYKkQ399fYHSVurSLXJsWTrLvs_n775D6LprdOoxmwxUSiMI3YhAxVoEkhDt4YMGDFIVm-CjUTKdinEN3WxzYay1FfnMtuGzess3TpcQKusIRuAZr47qnHfXuVrbUzfhLPli6oSic9_vTSZj72wj4G9FpL0Z-6uISuVDhvv_m_0Atb6T8fB462YOUc3mR2jvh45gE70O8jnoZuRvOMsLhz2ow31nrL7FI5etLH50qlwVeLKwVs_hF3SFECx-qaL2wUPpLZx9WIN7ZeFA3BIIzi30PBw89e-CTcWEIItCWgSahlIyxg3odDFq_XVKpYyzUCahv8n5_aookToVseaGJbGljKs41FwQ5YFMSI9RI3e5PUFYaE5MnBCpOGVRKpRRikqhmIk1i3h6ippgotliLYox21jn7O_mc7QLq1BxrLoXqFEsS3uJdvR7ka2WV9VKfgJsop6n
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8QTdSLH2D8tgePDtq1o6s3QjAQkaCgciP9muyyEdg8-NfbDkRNvHhbujZNXtO-X19_7_cAuG5oFVnMJjwZEd-FbrgnA8U9gbGy8EE5DFIUm2D9fjge80EJ3KxzYYwxBfnM1Nxn8ZavU5W7UFmdU-ye8TbAZkCpj5bZWutzN2Q0_OLqIF7vtprD4cC6W98xuHxcW43-VUal8CJ3e_-bfx9Uv9Px4GDtaA5AySSHYPeHkmAFvLaTqVPOSN5gnGQptLAOtlJt1C3sp_HCwKdU5osMDmfGqKn75bq6ICx8KeL23mNubRx_GA2beZY6eUtHca6C57v2qNXxVjUTvNhHJPMUQUJQyrRT6qLE2AuVjCijSITI3uXsjpUECxXxQDFNw8AQymSAFONYWiiDyBEoJ2lijgHkimEdhFhIRqgfcamlJIJLqgNFfRadgIoz0WS2lMWYrKxz-nfzFdjujB56k163f38GdtyKFIyrxjkoZ_PcXIAt9Z7Fi_llsaqf3Qyh7g
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Enhancing+into+the+Codec%3A+Noise+Robust+Speech+Coding+with+Vector-Quantized+Autoencoders&rft.au=Casebeer%2C+Jonah&rft.au=Vale%2C+Vinjai&rft.au=Isik%2C+Umut&rft.au=Valin%2C+Jean-Marc&rft.date=2021-06-06&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=711&rft.epage=715&rft_id=info:doi/10.1109%2FICASSP39728.2021.9414605&rft.externalDocID=9414605