Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity

Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can he...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 861 - 865
Hlavní autoři: Jayashankar, Tejas, Koehler, Thilo, Kalgaonkar, Kaustubh, Xiu, Zhiping, Wu, Jilong, Lin, Ju, Agrawal, Prabhav, He, Qing
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 23.05.2022
Témata:
ISSN:2379-190X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Low bitrate speech codecs have become an area of intense research. Traditional speech codecs, which use signal processing methods to encode and decode speech, often suffer from quality issues at low bitrates. A neural speech codec, which uses a deep neural network in the compression pipeline, can help alleviate this issue. In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. Our codec uses a hierarchical VQ-VAE (HVQVAE) for encoding and decoding spectral features at different bitrates. The decoded features are fed to a vocoder for speech synthesis. Depending upon the end user's computing resources, the decoder either uses a powerful WaveRNN or a parametric vocoder for speech synthesis. Our experiments demonstrate that our HVQVAE + WaveRNN setup achieves high audio quality.
ISSN:2379-190X
DOI:10.1109/ICASSP43922.2022.9747419