Preserving Speaker Identity in Speech-to-Speech Translation: An Exploration of Attention-Based Approaches

Effective speech-to-speech translation (STST) requires not only accurate linguistic conversion but also preservation of the speaker's unique vocal identity. The paper research investigates the efficacy of attention-based encoder-decoder architectures in achieving this goal. The impact of incorp...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International Conference on Computing Communication Control and Automation (Online) s. 1 - 6
Hlavní autoři:	Jaybhaye, S. M, Lale, Yogesh, Kulkarni, Parth, Diwnale, Tanvi, Kota, Apurva
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 23.08.2024
Témata:	Accuracy Attention mechanisms Automation Bridges Computer architecture Decoding encoder-decoder architectures Focusing Measurement speaker embeddings speaker identity Speaker recognition speech-to-speech translation
ISSN:	2771-1358
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Effective speech-to-speech translation (STST) requires not only accurate linguistic conversion but also preservation of the speaker's unique vocal identity. The paper research investigates the efficacy of attention-based encoder-decoder architectures in achieving this goal. The impact of incorporating speaker embeddings through various attention mechanisms is explored, including speaker-aware self-attention, cross-attention with speaker embeddings, and a dedicated speaker attention module within the decoder. Utilizing the CVSS multilingual dataset. The approach is rigorously evaluated through objective metrics (BLEU, WER, speaker recognition accuracy, cosine similarity, FID) and subjective human perception studies. The results demonstrates that dedicated speaker attention and cross-attention mechanisms within the decoder significantly enhance speaker identity preservation without compromising translation accuracy. These results pave the way for the development of STST systems that deliver both accurate content and natural, personalized communication experiences.
ISSN:	2771-1358
DOI:	10.1109/ICCUBEA61740.2024.10774862