MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

IEEE Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD) as we expect that MOS can be used to assess how close synthesized speech is to the natural human v...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) S. 876 - 880
Hauptverfasser:	Zhou, Wangjin, Yang, Zhengdong, Chu, Chenhui, Li, Sheng, Dabre, Raj, Zhao, Yi, Tatsuya, Kawahara
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 14.04.2024
Schlagworte:	fake audio detection (FAD) Logic gates model fusion MOS prediction Predictive models self-supervised learned (SSL) model Signal processing Speech synthesis Task analysis Training Training data
ISSN:	2379-190X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	IEEE Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD) as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion. In training data selection, we demonstrate that MOS enables effective filtering of samples from unbalanced datasets. In the model fusion, our results demonstrate that incorporating MOS as a gating mechanism in FAD model fusion enhances overall performance.
ISSN:	2379-190X
DOI:	10.1109/ICASSP48485.2024.10446041