Utterance Style Transfer Using Deep Models

The paper describes a solution to utterance style transfer within the speaker’s identity and emotional tone exchange while maintaining the utterance’s content. Using deep generative neural networks, we developed two models that differ in introducing information related to the speaker’s identity and...

Full description

Saved in:
Bibliographic Details
Published in:Procedia computer science Vol. 192; pp. 2132 - 2141
Main Authors: Popek, Daniel, Markowska-Kaczmar, Urszula
Format: Journal Article
Language:English
Published: Elsevier B.V 2021
Subjects:
ISSN:1877-0509, 1877-0509
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The paper describes a solution to utterance style transfer within the speaker’s identity and emotional tone exchange while maintaining the utterance’s content. Using deep generative neural networks, we developed two models that differ in introducing information related to the speaker’s identity and an additional variable representing the expected emotional category. The embedding of emotions is taken from the convolutional network that was trained to classify emotional categories. Siamese network is responsible for learning the content embeddings. The models can perform the style transfer between any two speakers with a satisfactory result. This assessment is based on a survey that considers the degree of content retention, the quality of transferring voice features related to identity, and the degree of converting emotional features into a desirable category.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2021.08.226