Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity

Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can he...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 861 - 865
Hlavní autoři: Jayashankar, Tejas, Koehler, Thilo, Kalgaonkar, Kaustubh, Xiu, Zhiping, Wu, Jilong, Lin, Ju, Agrawal, Prabhav, He, Qing
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 23.05.2022
Témata:
ISSN:2379-190X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can help alleviate this issue. In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. Our codec uses a hierarchical VQ-VAE (HVQVAE) for encoding and decoding spectral features at different bitrates. The decoded features are fed to a vocoder for speech synthesis. Depending upon the end user's computing resources, the decoder either uses a powerful WaveRNN or a parametric vocoder for speech synthesis. Our experiments demonstrate that our HVQVAE + WaveRNN setup achieves high audio quality.
AbstractList Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can help alleviate this issue. In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. Our codec uses a hierarchical VQ-VAE (HVQVAE) for encoding and decoding spectral features at different bitrates. The decoded features are fed to a vocoder for speech synthesis. Depending upon the end user's computing resources, the decoder either uses a powerful WaveRNN or a parametric vocoder for speech synthesis. Our experiments demonstrate that our HVQVAE + WaveRNN setup achieves high audio quality.
Author Agrawal, Prabhav
Wu, Jilong
Xiu, Zhiping
Lin, Ju
Koehler, Thilo
He, Qing
Kalgaonkar, Kaustubh
Jayashankar, Tejas
Author_xml – sequence: 1
  givenname: Tejas
  surname: Jayashankar
  fullname: Jayashankar, Tejas
  organization: Massachusetts Institute of Technology
– sequence: 2
  givenname: Thilo
  surname: Koehler
  fullname: Koehler, Thilo
  organization: Facebook AI
– sequence: 3
  givenname: Kaustubh
  surname: Kalgaonkar
  fullname: Kalgaonkar, Kaustubh
  organization: Facebook AI
– sequence: 4
  givenname: Zhiping
  surname: Xiu
  fullname: Xiu, Zhiping
  organization: Facebook AI
– sequence: 5
  givenname: Jilong
  surname: Wu
  fullname: Wu, Jilong
  organization: Facebook AI
– sequence: 6
  givenname: Ju
  surname: Lin
  fullname: Lin, Ju
  organization: Facebook AI
– sequence: 7
  givenname: Prabhav
  surname: Agrawal
  fullname: Agrawal, Prabhav
  organization: Facebook AI
– sequence: 8
  givenname: Qing
  surname: He
  fullname: He, Qing
  organization: Facebook AI
BookMark eNotkF9LwzAUxaMouE4_gS_5Ap1JmiW5j7P4D4YKVfHJkaa3LtK1JUvRfXs7HVzuPXAOPy4nISdt1yIhlLMZ5wyuHvJFUTzLDISYCTYu0FJLDkck4UrNJRtHHZOJyDSkHNj7GUm22y_GmNHSTMjHIri1j-jiEJDWXaBvNnhbNkivfQw2In3EIdiGFj2iW9O8q9DRbx_3sq3952ju03m36Ydoo-_aP93gj4-7c3Ja22aLF4c7Ja-3Ny_5fbp8uhtfX6ZesCymRoHlcwcojFRG2xIEKKgQhLSl01WtK6aAlxmMSW1VZipdogNpNNZqREzJ5T_XI-KqD35jw2516CL7BSMQV3Q
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP43922.2022.9747419
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1665405406
9781665405409
EISSN 2379-190X
EndPage 865
ExternalDocumentID 9747419
Genre orig-research
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i203t-869a15c9e284687ab92969de924abc7df7d0691b398697a638d7bec9487ef6203
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000864187901027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:25:06 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-869a15c9e284687ab92969de924abc7df7d0691b398697a638d7bec9487ef6203
PageCount 5
ParticipantIDs ieee_primary_9747419
PublicationCentury 2000
PublicationDate 2022-May-23
PublicationDateYYYYMMDD 2022-05-23
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May-23
  day: 23
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.2499988
Snippet Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech,...
SourceID ieee
SourceType Publisher
StartPage 861
SubjectTerms Bit rate
Packet loss
Pipelines
Speech codec
Speech codecs
Speech coding
Training
Variable Rate
Vocoders
VQ-VAE
WaveRNN
Title Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity
URI https://ieeexplore.ieee.org/document/9747419
WOSCitedRecordID wos000864187901027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61eNCLj1Z8k4NHY_fVPI61WPRSClXpyZLHRBekLXUr-O-dpLUqePGyhN1JAhk2M5PJ9w0hF1ZJZZ3IGbRBsCKFlBlpOHPSiCSDpC2ljsUmRL8vRyM1qJHLNRYGAOLlM7gKzZjLd1O7CEdlreD7FoHjc0MIvsRqrXddKQr5dVMnUa27bmc4HKC1zQLaCh-rvr-KqEQb0tv53-y7pPkNxqODtZnZIzWY7JPtHzyCDfLU-ZEOoOiG0kcMgQMoil6XkX6WBhIO_UqHMwD7QrtTB5aGM1gaZimf8WOQXtZ4iMqK7cCWWX00yUPv5r57y1aFE1iZJXnFJFc6bVsFaHu4FNqgD8SVA4y1tLHCeeESrlKTK5QUGn9BJ1CXCoMX8ByHOCD1yXQCh4QqMBhDoqzHsSDxqgDrvUyFzn3meHFEGmGlxrMlN8Z4tUjHf78-IVtBGSH7nuWnpF7NF3BGNu17Vb7Nz6NCPwFIfKMY
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4ImqgXH2B824NHV_bFtj0ikUBEQgIaTpLddlY3MUBwMfHfO1NWxMSLl02zO22TTrYz0-n3DWNXWkmljQgcqINwQg88J5FJ5BiZCNcHty5lbItNiF5PjkaqX2LXKywMANjLZ3BDTZvLN1O9oKOyGvm-IXF8blDlrAKttdp3pQjl910dV9U6zcZg0Ed76xPeCh9F719lVKwVae3-b_49Vv2B4_H-ytDssxJMDtjOGpNghT031hICHB1R_oRBMMGi-G1mCWg50XDEb3wwA9CvvDk1oDmdwnKaJXvBjyS9rPJg1WXbxJeZf1bZY-tu2Gw7RekEJ_PdIHdkpGKvrhWg9YmkiBP0giJlAKOtONHCpMK4kfKSQKGkiPEnNAK1qTB8gTTCIQ5ZeTKdwBHjChKMIlE2xbHATVUIOk2lJ-Ig9U0UHrMKrdR4tmTHGBeLdPL360u21R4-dMfdTu_-lG2TYigX7wdnrJzPF3DONvVHnr3PL6xyvwD1GqZh
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Architecture+for+Variable+Bitrate+Neural+Speech+Codec+with+Configurable+Computation+Complexity&rft.au=Jayashankar%2C+Tejas&rft.au=Koehler%2C+Thilo&rft.au=Kalgaonkar%2C+Kaustubh&rft.au=Xiu%2C+Zhiping&rft.date=2022-05-23&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=861&rft.epage=865&rft_id=info:doi/10.1109%2FICASSP43922.2022.9747419&rft.externalDocID=9747419