Gigant-KTTS dataset: Towards building an extensive gigant dataset for Kurdish text-to-speech systems

Today, speech synthesis is a part of our daily lives in computers all around the world. Central Kurdish Speech Corpus Construction is a speech corpus that is a primary data source for developing a speech system. There are still two main issues that prevent them from achieving the best possible perfo...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Data in brief Ročník 55; s. 110753
Hlavní autoři: Ahmad, Hawraz A., Rashid, Tarik A.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Netherlands Elsevier Inc 01.08.2024
Elsevier
Témata:
ISSN:2352-3409, 2352-3409
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Today, speech synthesis is a part of our daily lives in computers all around the world. Central Kurdish Speech Corpus Construction is a speech corpus that is a primary data source for developing a speech system. There are still two main issues that prevent them from achieving the best possible performance, the lack of efficiency in training and analysis, and the difficulty in modelling. The biggest obstacle against text-to-speech in the Kurdish language is that there is a lack of text and speech recognition tools compounded by the fact that around 30 million people speak the Kurdish language in different countries. To address this issue, this corpus introduced a large vocabulary of Kurdish Text-to-Speech Dataset (KTTS, Gigant), including a pronunciation lexicon and speech corpus for the Central Kurdish dialect. A variety of subjects is comprised to record these sentences. The sentences are recorded in a voice recording studio by a Kurdish man who is a dubber. The goal of the speech corpus is to create a collection of sentences that accurately reflect the real data about the Central Kurdish dialect. A combination of audio and visual sources is used to record the 6,078 sentences of 12 document topics. They were recorded in a controlled environment using microphones that were not noisy. The total record duration is 13.63 h. The recorded sentences are in the “.wav” format.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2352-3409
2352-3409
DOI:10.1016/j.dib.2024.110753