Utterance Style Transfer Using Deep Models

The paper describes a solution to utterance style transfer within the speaker’s identity and emotional tone exchange while maintaining the utterance’s content. Using deep generative neural networks, we developed two models that differ in introducing information related to the speaker’s identity and...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Procedia computer science Ročník 192; s. 2132 - 2141
Hlavní autoři: Popek, Daniel, Markowska-Kaczmar, Urszula
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 2021
Témata:
ISSN:1877-0509, 1877-0509
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The paper describes a solution to utterance style transfer within the speaker’s identity and emotional tone exchange while maintaining the utterance’s content. Using deep generative neural networks, we developed two models that differ in introducing information related to the speaker’s identity and an additional variable representing the expected emotional category. The embedding of emotions is taken from the convolutional network that was trained to classify emotional categories. Siamese network is responsible for learning the content embeddings. The models can perform the style transfer between any two speakers with a satisfactory result. This assessment is based on a survey that considers the degree of content retention, the quality of transferring voice features related to identity, and the degree of converting emotional features into a desirable category.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2021.08.226