Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity
Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can he...
Uloženo v:
| Vydáno v: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 861 - 865 |
|---|---|
| Hlavní autoři: | , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
23.05.2022
|
| Témata: | |
| ISSN: | 2379-190X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can help alleviate this issue. In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. Our codec uses a hierarchical VQ-VAE (HVQVAE) for encoding and decoding spectral features at different bitrates. The decoded features are fed to a vocoder for speech synthesis. Depending upon the end user's computing resources, the decoder either uses a powerful WaveRNN or a parametric vocoder for speech synthesis. Our experiments demonstrate that our HVQVAE + WaveRNN setup achieves high audio quality. |
|---|---|
| AbstractList | Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can help alleviate this issue. In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. Our codec uses a hierarchical VQ-VAE (HVQVAE) for encoding and decoding spectral features at different bitrates. The decoded features are fed to a vocoder for speech synthesis. Depending upon the end user's computing resources, the decoder either uses a powerful WaveRNN or a parametric vocoder for speech synthesis. Our experiments demonstrate that our HVQVAE + WaveRNN setup achieves high audio quality. |
| Author | Agrawal, Prabhav Wu, Jilong Xiu, Zhiping Lin, Ju Koehler, Thilo He, Qing Kalgaonkar, Kaustubh Jayashankar, Tejas |
| Author_xml | – sequence: 1 givenname: Tejas surname: Jayashankar fullname: Jayashankar, Tejas organization: Massachusetts Institute of Technology – sequence: 2 givenname: Thilo surname: Koehler fullname: Koehler, Thilo organization: Facebook AI – sequence: 3 givenname: Kaustubh surname: Kalgaonkar fullname: Kalgaonkar, Kaustubh organization: Facebook AI – sequence: 4 givenname: Zhiping surname: Xiu fullname: Xiu, Zhiping organization: Facebook AI – sequence: 5 givenname: Jilong surname: Wu fullname: Wu, Jilong organization: Facebook AI – sequence: 6 givenname: Ju surname: Lin fullname: Lin, Ju organization: Facebook AI – sequence: 7 givenname: Prabhav surname: Agrawal fullname: Agrawal, Prabhav organization: Facebook AI – sequence: 8 givenname: Qing surname: He fullname: He, Qing organization: Facebook AI |
| BookMark | eNotkF9LwzAUxaMouE4_gS_5Ap1JmiW5j7P4D4YKVfHJkaa3LtK1JUvRfXs7HVzuPXAOPy4nISdt1yIhlLMZ5wyuHvJFUTzLDISYCTYu0FJLDkck4UrNJRtHHZOJyDSkHNj7GUm22y_GmNHSTMjHIri1j-jiEJDWXaBvNnhbNkivfQw2In3EIdiGFj2iW9O8q9DRbx_3sq3952ju03m36Ydoo-_aP93gj4-7c3Ja22aLF4c7Ja-3Ny_5fbp8uhtfX6ZesCymRoHlcwcojFRG2xIEKKgQhLSl01WtK6aAlxmMSW1VZipdogNpNNZqREzJ5T_XI-KqD35jw2516CL7BSMQV3Q |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP43922.2022.9747419 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 1665405406 9781665405409 |
| EISSN | 2379-190X |
| EndPage | 865 |
| ExternalDocumentID | 9747419 |
| Genre | orig-research |
| GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i203t-869a15c9e284687ab92969de924abc7df7d0691b398697a638d7bec9487ef6203 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000864187901027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:25:06 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-869a15c9e284687ab92969de924abc7df7d0691b398697a638d7bec9487ef6203 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_9747419 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-May-23 |
| PublicationDateYYYYMMDD | 2022-05-23 |
| PublicationDate_xml | – month: 05 year: 2022 text: 2022-May-23 day: 23 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008748 |
| Score | 2.2499988 |
| Snippet | Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 861 |
| SubjectTerms | Bit rate Packet loss Pipelines Speech codec Speech codecs Speech coding Training Variable Rate Vocoders VQ-VAE WaveRNN |
| Title | Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity |
| URI | https://ieeexplore.ieee.org/document/9747419 |
| WOSCitedRecordID | wos000864187901027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61eNCLj1Z8k4NHY_fVPI61WPRSClXpyZLHRBekLXUr-O-dpLUqePGyhN1JAhk2M5PJ9w0hF1ZJZZ3IGbRBsCKFlBlpOHPSiCSDpC2ljsUmRL8vRyM1qJHLNRYGAOLlM7gKzZjLd1O7CEdlreD7FoHjc0MIvsRqrXddKQr5dVMnUa27bmc4HKC1zQLaCh-rvr-KqEQb0tv53-y7pPkNxqODtZnZIzWY7JPtHzyCDfLU-ZEOoOiG0kcMgQMoil6XkX6WBhIO_UqHMwD7QrtTB5aGM1gaZimf8WOQXtZ4iMqK7cCWWX00yUPv5r57y1aFE1iZJXnFJFc6bVsFaHu4FNqgD8SVA4y1tLHCeeESrlKTK5QUGn9BJ1CXCoMX8ByHOCD1yXQCh4QqMBhDoqzHsSDxqgDrvUyFzn3meHFEGmGlxrMlN8Z4tUjHf78-IVtBGSH7nuWnpF7NF3BGNu17Vb7Nz6NCPwFIfKMY |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4ImqgXH2B824NHV_bFtj0ikUBEQgIaTpLddlY3MUBwMfHfO1NWxMSLl02zO22TTrYz0-n3DWNXWkmljQgcqINwQg88J5FJ5BiZCNcHty5lbItNiF5PjkaqX2LXKywMANjLZ3BDTZvLN1O9oKOyGvm-IXF8blDlrAKttdp3pQjl910dV9U6zcZg0Ed76xPeCh9F719lVKwVae3-b_49Vv2B4_H-ytDssxJMDtjOGpNghT031hICHB1R_oRBMMGi-G1mCWg50XDEb3wwA9CvvDk1oDmdwnKaJXvBjyS9rPJg1WXbxJeZf1bZY-tu2Gw7RekEJ_PdIHdkpGKvrhWg9YmkiBP0giJlAKOtONHCpMK4kfKSQKGkiPEnNAK1qTB8gTTCIQ5ZeTKdwBHjChKMIlE2xbHATVUIOk2lJ-Ig9U0UHrMKrdR4tmTHGBeLdPL360u21R4-dMfdTu_-lG2TYigX7wdnrJzPF3DONvVHnr3PL6xyvwD1GqZh |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Architecture+for+Variable+Bitrate+Neural+Speech+Codec+with+Configurable+Computation+Complexity&rft.au=Jayashankar%2C+Tejas&rft.au=Koehler%2C+Thilo&rft.au=Kalgaonkar%2C+Kaustubh&rft.au=Xiu%2C+Zhiping&rft.date=2022-05-23&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=861&rft.epage=865&rft_id=info:doi/10.1109%2FICASSP43922.2022.9747419&rft.externalDocID=9747419 |