Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Ba...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 711 - 715
Hlavní autoři: Casebeer, Jonah, Vale, Vinjai, Isik, Umut, Valin, Jean-Marc, Giri, Ritwik, Krishnaswamy, Arvindh
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 06.06.2021
Témata:
ISSN:2379-190X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech.
AbstractList Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech.
Author Vale, Vinjai
Isik, Umut
Valin, Jean-Marc
Casebeer, Jonah
Krishnaswamy, Arvindh
Giri, Ritwik
Author_xml – sequence: 1
  givenname: Jonah
  surname: Casebeer
  fullname: Casebeer, Jonah
  organization: University of Illinois at Urbana-Champaign
– sequence: 2
  givenname: Vinjai
  surname: Vale
  fullname: Vale, Vinjai
  organization: Stanford University
– sequence: 3
  givenname: Umut
  surname: Isik
  fullname: Isik, Umut
  organization: Amazon Web Services
– sequence: 4
  givenname: Jean-Marc
  surname: Valin
  fullname: Valin, Jean-Marc
  organization: Amazon Web Services
– sequence: 5
  givenname: Ritwik
  surname: Giri
  fullname: Giri, Ritwik
  organization: Amazon Web Services
– sequence: 6
  givenname: Arvindh
  surname: Krishnaswamy
  fullname: Krishnaswamy, Arvindh
  organization: Amazon Web Services
BookMark eNotUMlOwzAQNQgk2tIv4OIfSBjbk9rmVlVlkSq2st0qx5kSI3CqxBWCrycVPT29ZUZPb8iOYhOJMS4gFwLs-c1sulzeK6ulySVIkVsUOIHigI2tNqKXhe5pccgGUmmbCQtvJ2zYdR8AYDSaAXudx9pFH-I7DzE1PNXEZ01F_oLfNqEj_tiU2y7x5YbI1ztrF_0OqeYv5FPTZg9bF1P4pYpPt6mh6Pvrtjtlx2v32dF4jyP2fDl_ml1ni7urvvYiCxJUyrwC5xB1NcECUZGRplyjRnAGDAIClEo4v7aF1xWaghTqsgCvrSi11KBG7Oz_byCi1aYNX679We13UH-Bd1Pw
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP39728.2021.9414605
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781728176055
1728176050
EISSN 2379-190X
EndPage 715
ExternalDocumentID 9414605
Genre orig-research
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703
IEDL.DBID RIE
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:39:02 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-c30aa447d645443e828bf4740a80840400b31acf95c7d485e347b50c791b72703
PageCount 5
ParticipantIDs ieee_primary_9414605
PublicationCentury 2000
PublicationDate 2021-June-6
PublicationDateYYYYMMDD 2021-06-06
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-June-6
  day: 06
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.3270369
Snippet Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable...
SourceID ieee
SourceType Publisher
StartPage 711
SubjectTerms audio compression
Codecs
Convolution
Convolutional codes
Decoding
Speech coding
Speech enhancement
Training
Title Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
URI https://ieeexplore.ieee.org/document/9414605
WOSCitedRecordID wos000704288400143&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG-AeNCLH2DEr_Tg0UK3va2rN0IgmhiC4gc3srVvsstGYPPgX287EDXx4q1p1y15b-2vff29Xwm5ikEqVyaKGahKGPiCs9jnEUPzA2ltxYdUJeJ6L0ajcDqV4xq53ubCIGJFPsOOLVZn-TpXpQ2VdSU49hivTupCBOtcre2sGwoIv5g6XHbv-r3JZGzA1rX8LdfpbPr-ukSlwpDh_v--fkBa38l4dLyFmUNSw-yI7P3QEWyS10E2t7oZ2RtNsyKnZlFH-7lGdUNHebpC-pjH5aqgkwWimtsm-6gNwdKXKmrPHkpj4fQDNe2VRW7FLS3BuUWeh4On_i3b3JjAUpd7BVMejyIAoa1OF3hotlNxAgJ4FHKzkzPjNfacSCXSV0JD6KMHwrhGCenEZiHDvWPSyPIMTwh1A-kKg908UQq01LELQQSJZ17H0Q-gTZrWRLPFWhRjtrHO6d_VZ2TXeqHiWAXnpFEsS7wgO-q9SFfLy8qTnwOund0
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4QTdSLP8D42x48Wui2t3X1RggGIi4oqNzI1nayy0Zg8-BfbzsQNfHirWnXLXlv7de-fu8rQtcRcGHzWBANVTEBl1ESuTQkSv9AUhrxIVGKuPZZEPjjMR9U0M06F0YpVZLPVMMUy7N8mYnChMqaHCxzjLeBNl0Amy6ztdbzrs_A_-LqUN7stVvD4UDDrW0YXLbVWPX-dY1KiSJ3e__7_j6qf6fj4cEaaA5QRaWHaPeHkmANvXbSqVHOSN9wkuYZ1ss63M6kErc4yJKFwk9ZVCxyPJwpJaamyTxqgrD4pYzbk8dC2zj5UBK3ijwz8paG4lxHz3edUbtLVncmkMSmTk6EQ8MQgEmj1AWO0huqKAYGNPSp3svpERs5Vihi7gomwXeVA0w7RzBuRXopQ50jVE2zVB0jbHvcZhq9aSwESC4jG7wQYke_jirXgxNUMyaazJayGJOVdU7_rr5C293RQ3_S7wX3Z2jHeKRkXHnnqJrPC3WBtsR7nizml6VXPwFqhqEk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Enhancing+into+the+Codec%3A+Noise+Robust+Speech+Coding+with+Vector-Quantized+Autoencoders&rft.au=Casebeer%2C+Jonah&rft.au=Vale%2C+Vinjai&rft.au=Isik%2C+Umut&rft.au=Valin%2C+Jean-Marc&rft.date=2021-06-06&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=711&rft.epage=715&rft_id=info:doi/10.1109%2FICASSP39728.2021.9414605&rft.externalDocID=9414605