Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity

Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can he...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 861 - 865
Hlavní autoři:	Jayashankar, Tejas, Koehler, Thilo, Kalgaonkar, Kaustubh, Xiu, Zhiping, Wu, Jilong, Lin, Ju, Agrawal, Prabhav, He, Qing
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 23.05.2022
Témata:	Bit rate Packet loss Pipelines Speech codec Speech codecs Speech coding Training Variable Rate Vocoders VQ-VAE WaveRNN
ISSN:	2379-190X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can help alleviate this issue. In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. Our codec uses a hierarchical VQ-VAE (HVQVAE) for encoding and decoding spectral features at different bitrates. The decoded features are fed to a vocoder for speech synthesis. Depending upon the end user's computing resources, the decoder either uses a powerful WaveRNN or a parametric vocoder for speech synthesis. Our experiments demonstrate that our HVQVAE + WaveRNN setup achieves high audio quality.
AbstractList	Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can help alleviate this issue. In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. Our codec uses a hierarchical VQ-VAE (HVQVAE) for encoding and decoding spectral features at different bitrates. The decoded features are fed to a vocoder for speech synthesis. Depending upon the end user's computing resources, the decoder either uses a powerful WaveRNN or a parametric vocoder for speech synthesis. Our experiments demonstrate that our HVQVAE + WaveRNN setup achieves high audio quality.
Author	Agrawal, Prabhav Wu, Jilong Xiu, Zhiping Lin, Ju Koehler, Thilo He, Qing Kalgaonkar, Kaustubh Jayashankar, Tejas
Author_xml	– sequence: 1 givenname: Tejas surname: Jayashankar fullname: Jayashankar, Tejas organization: Massachusetts Institute of Technology – sequence: 2 givenname: Thilo surname: Koehler fullname: Koehler, Thilo organization: Facebook AI – sequence: 3 givenname: Kaustubh surname: Kalgaonkar fullname: Kalgaonkar, Kaustubh organization: Facebook AI – sequence: 4 givenname: Zhiping surname: Xiu fullname: Xiu, Zhiping organization: Facebook AI – sequence: 5 givenname: Jilong surname: Wu fullname: Wu, Jilong organization: Facebook AI – sequence: 6 givenname: Ju surname: Lin fullname: Lin, Ju organization: Facebook AI – sequence: 7 givenname: Prabhav surname: Agrawal fullname: Agrawal, Prabhav organization: Facebook AI – sequence: 8 givenname: Qing surname: He fullname: He, Qing organization: Facebook AI
BookMark	eNotkF9LwzAUxaMouE4_gS_5Ap1JmiW5j7P4D4YKVfHJkaa3LtK1JUvRfXs7HVzuPXAOPy4nISdt1yIhlLMZ5wyuHvJFUTzLDISYCTYu0FJLDkck4UrNJRtHHZOJyDSkHNj7GUm22y_GmNHSTMjHIri1j-jiEJDWXaBvNnhbNkivfQw2In3EIdiGFj2iW9O8q9DRbx_3sq3952ju03m36Ydoo-_aP93gj4-7c3Ja22aLF4c7Ja-3Ny_5fbp8uhtfX6ZesCymRoHlcwcojFRG2xIEKKgQhLSl01WtK6aAlxmMSW1VZipdogNpNNZqREzJ5T_XI-KqD35jw2516CL7BSMQV3Q
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP43922.2022.9747419
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	1665405406 9781665405409
EISSN	2379-190X
EndPage	865
ExternalDocumentID	9747419
Genre	orig-research
GroupedDBID	23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i203t-869a15c9e284687ab92969de924abc7df7d0691b398697a638d7bec9487ef6203
IEDL.DBID	RIE
ISICitedReferencesCount	3
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000864187901027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:25:06 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-869a15c9e284687ab92969de924abc7df7d0691b398697a638d7bec9487ef6203
PageCount	5
ParticipantIDs	ieee_primary_9747419
PublicationCentury	2000
PublicationDate	2022-May-23
PublicationDateYYYYMMDD	2022-05-23
PublicationDate_xml	– month: 05 year: 2022 text: 2022-May-23 day: 23
PublicationDecade	2020
PublicationTitle	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev	ICASSP
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.2499988
Snippet	Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech,...
SourceID	ieee
SourceType	Publisher
StartPage	861
SubjectTerms	Bit rate Packet loss Pipelines Speech codec Speech codecs Speech coding Training Variable Rate Vocoders VQ-VAE WaveRNN
Title	Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity
URI	https://ieeexplore.ieee.org/document/9747419
WOSCitedRecordID	wos000864187901027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61eNCLj1Z8k4NHY_fVPI61WPRSClXpyZLHRBekLXUr-O-dpLUqePGyhN1JAhk2M5PJ9w0hF1ZJZZ3IGbRBsCKFlBlpOHPSiCSDpC2ljsUmRL8vRyM1qJHLNRYGAOLlM7gKzZjLd1O7CEdlreD7FoHjc0MIvsRqrXddKQr5dVMnUa27bmc4HKC1zQLaCh-rvr-KqEQb0tv53-y7pPkNxqODtZnZIzWY7JPtHzyCDfLU-ZEOoOiG0kcMgQMoil6XkX6WBhIO_UqHMwD7QrtTB5aGM1gaZimf8WOQXtZ4iMqK7cCWWX00yUPv5r57y1aFE1iZJXnFJFc6bVsFaHu4FNqgD8SVA4y1tLHCeeESrlKTK5QUGn9BJ1CXCoMX8ByHOCD1yXQCh4QqMBhDoqzHsSDxqgDrvUyFzn3meHFEGmGlxrMlN8Z4tUjHf78-IVtBGSH7nuWnpF7NF3BGNu17Vb7Nz6NCPwFIfKMY
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4ImqgXH2B824NHV_bFtj0ikUBEQgIaTpLddlY3MUBwMfHfO1NWxMSLl02zO22TTrYz0-n3DWNXWkmljQgcqINwQg88J5FJ5BiZCNcHty5lbItNiF5PjkaqX2LXKywMANjLZ3BDTZvLN1O9oKOyGvm-IXF8blDlrAKttdp3pQjl910dV9U6zcZg0Ed76xPeCh9F719lVKwVae3-b_49Vv2B4_H-ytDssxJMDtjOGpNghT031hICHB1R_oRBMMGi-G1mCWg50XDEb3wwA9CvvDk1oDmdwnKaJXvBjyS9rPJg1WXbxJeZf1bZY-tu2Gw7RekEJ_PdIHdkpGKvrhWg9YmkiBP0giJlAKOtONHCpMK4kfKSQKGkiPEnNAK1qTB8gTTCIQ5ZeTKdwBHjChKMIlE2xbHATVUIOk2lJ-Ig9U0UHrMKrdR4tmTHGBeLdPL360u21R4-dMfdTu_-lG2TYigX7wdnrJzPF3DONvVHnr3PL6xyvwD1GqZh
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Architecture+for+Variable+Bitrate+Neural+Speech+Codec+with+Configurable+Computation+Complexity&rft.au=Jayashankar%2C+Tejas&rft.au=Koehler%2C+Thilo&rft.au=Kalgaonkar%2C+Kaustubh&rft.au=Xiu%2C+Zhiping&rft.date=2022-05-23&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=861&rft.epage=865&rft_id=info:doi/10.1109%2FICASSP43922.2022.9747419&rft.externalDocID=9747419