Bibliographische Detailangaben
| Titel: |
An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization. |
| Autoren: |
Fu, Changzeng, Liu, Chaoran, Ishi, Carlos Toshinori, Ishiguro, Hiroshi |
| Quelle: |
IEEE Transactions on Affective Computing; Jul-Sep2023, Vol. 14 Issue 3, p2361-2374, 14p |
| Abstract: |
Speaker individual bias may cause emotion-related features to form clusters with irregular borders (non-Gaussian distributions), making the model sensitive to local irregularities of pattern distributions, resulting in the model over-fit of the in-domain dataset. This problem may cause a decrease in the validation scores in cross-domain (i.e., speaker-independent, channel-variant) implementation. To mitigate this problem, in this paper, we propose an adversarial training-based classifier to regularize the distribution of latent representations to further smooth the boundaries among different categories. In the regularization phase, the representations are mapped into Gaussian distributions in an unsupervised manner to improve the discriminative ability of the latent representations. A single Gaussian distribution is used for mapping the latent representations in our previous study. In this presented work, we adopt a mixture of isolated Gaussian distributions. Moreover, multi-instance learning was adopted by dividing speech into a bag of segments to capture the most salient part of presenting an emotion. The model was evaluated on the IEMOCAP and MELD datasets with in-corpus speaker-independent sittings. In addition, we investigated the accuracy of cross-corpus sittings in simulating speaker-independent and channel-variants. In the experiment, the proposed model was compared not only with baseline models but also with different configurations of our model. The results show that the proposed model is competitive with respect to the baseline, as demonstrated both by in-corpus and cross-corpus validation. [ABSTRACT FROM AUTHOR] |
|
Copyright of IEEE Transactions on Affective Computing is the property of IEEE and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Datenbank: |
Complementary Index |