Gigant-KTTS dataset: Towards building an extensive gigant dataset for Kurdish text-to-speech systems

Today, speech synthesis is a part of our daily lives in computers all around the world. Central Kurdish Speech Corpus Construction is a speech corpus that is a primary data source for developing a speech system. There are still two main issues that prevent them from achieving the best possible perfo...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Data in brief Ročník 55; s. 110753
Hlavní autori: Ahmad, Hawraz A., Rashid, Tarik A.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Netherlands Elsevier Inc 01.08.2024
Elsevier
Predmet:
ISSN:2352-3409, 2352-3409
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Today, speech synthesis is a part of our daily lives in computers all around the world. Central Kurdish Speech Corpus Construction is a speech corpus that is a primary data source for developing a speech system. There are still two main issues that prevent them from achieving the best possible performance, the lack of efficiency in training and analysis, and the difficulty in modelling. The biggest obstacle against text-to-speech in the Kurdish language is that there is a lack of text and speech recognition tools compounded by the fact that around 30 million people speak the Kurdish language in different countries. To address this issue, this corpus introduced a large vocabulary of Kurdish Text-to-Speech Dataset (KTTS, Gigant), including a pronunciation lexicon and speech corpus for the Central Kurdish dialect. A variety of subjects is comprised to record these sentences. The sentences are recorded in a voice recording studio by a Kurdish man who is a dubber. The goal of the speech corpus is to create a collection of sentences that accurately reflect the real data about the Central Kurdish dialect. A combination of audio and visual sources is used to record the 6,078 sentences of 12 document topics. They were recorded in a controlled environment using microphones that were not noisy. The total record duration is 13.63 h. The recorded sentences are in the “.wav” format.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2352-3409
2352-3409
DOI:10.1016/j.dib.2024.110753