Spoken word corpus and dictionary definition for an African language

The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of di...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of data mining and digital humanities Ročník Special Issue on Collecting,...; číslo Digital humanities in...
Hlavní autori: Nganga, Wanjiku, Achebe, Ikechukwu
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: INRIA 02.12.2020
Nicolas Turenne
Predmet:
ISSN:2416-5999, 2416-5999
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of dictionary making cannot rely on textual corpora as has been the standard practice in lexicography. This paper emphasizes the significance of the spoken word and the oral tradition as repositories of vocabulary, and argues that spoken word corpora greatly outweigh the value of printed texts for lexicography. We describe a methodology for creating a digital dialectal dictionary for the Igbo language from such a spoken word corpus. We also highlight the language technology tools and resources that have been created to support the transcription of thousands of hours of Igbo speech and the subsequent compilation of these transcriptions into an XML-encoded textual corpus of Igbo dialects. The methodology described in this paper can serve as a blueprint that can be adopted for other under-resourced languages that have predominantly oral cultures.
AbstractList The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of dictionary making cannot rely on textual corpora as has been the standard practice in lexicography. This paper emphasizes the significance of the spoken word and the oral tradition as repositories of vocabulary, and argues that spoken word corpora greatly outweigh the value of printed texts for lexicography. We describe a methodology for creating a digital dialectal dictionary for the Igbo language from such a spoken word corpus. We also highlight the language technology tools and resources that have been created to support the transcription of thousands of hours of Igbo speech and the subsequent compilation of these transcriptions into an XML-encoded textual corpus of Igbo dialects. The methodology described in this paper can serve as a blueprint that can be adopted for other under-resourced languages that have predominantly oral cultures.
Author Nganga, Wanjiku
Achebe, Ikechukwu
Author_xml – sequence: 1
  givenname: Wanjiku
  surname: Nganga
  fullname: Nganga, Wanjiku
  organization: School of Computing & Informatics, University of Nairobi, Kenya
– sequence: 2
  givenname: Ikechukwu
  surname: Achebe
  fullname: Achebe, Ikechukwu
  organization: Nnamdi Azikiwe University
BackLink https://hal.science/hal-02912202$$DView record in HAL
BookMark eNpVUMtOwzAQtFCRKKUnfiBXhFL8zNrHqjxaqRIH4Gw5ttO6pHHltCD-nqRFCE6zOzs7Gs0lGjSx8QhdEzzhBVXybuO2bj0pALMzNKScFLlQSg3-zBdo3LYbjDERXAohhuj-ZRfffZN9xuQyG9Pu0GamcZkLdh9iY9JX5nwVmtBvWRVTd82mVQq2w9o0q4NZ-St0Xpm69eMfHKG3x4fX2TxfPj8tZtNlbgkwlpdWmIJIIKVkhHURsKTCSG-tYFZyzxVgqMApjm3pJHABqoASrMEFAcXZCC1Ovi6ajd6lsO3y6WiCPhIxrbRJ-2BrrwGwh0ISKqHiXpSGETC-K4JaJWhBO6-bk9fa1P-s5tOl7jlMFaEU049ee3vS2hTbNvnq94FgfexeH7vXfffsG9fwdio
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
BXJBU
IHQJB
VOOES
DOA
DOI 10.46298/jdmdh.6703
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access)
Hyper Article en Ligne (HAL) (Open Access)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList

CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
ExternalDocumentID oai_doaj_org_article_770e7681287f4e5ba317ae2412c95262
oai:HAL:hal-02912202v2
10_46298_jdmdh_6703
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
1XC
BXJBU
IHQJB
VOOES
ID FETCH-LOGICAL-c1733-bc5a61871b83131540825a8ecc53c84e49707f7d940cbd87457967b7ca0617943
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:52:34 EDT 2025
Tue Oct 14 20:36:31 EDT 2025
Sat Nov 29 04:10:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue Digital humanities in...
Keywords audio corpus
under-resourced languages
Igbo
oral tradition
dictionary definition
digital resources
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1733-bc5a61871b83131540825a8ecc53c84e49707f7d940cbd87457967b7ca0617943
OpenAccessLink https://doaj.org/article/770e7681287f4e5ba317ae2412c95262
ParticipantIDs doaj_primary_oai_doaj_org_article_770e7681287f4e5ba317ae2412c95262
hal_primary_oai_HAL_hal_02912202v2
crossref_primary_10_46298_jdmdh_6703
PublicationCentury 2000
PublicationDate 2020-12-02
PublicationDateYYYYMMDD 2020-12-02
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-12-02
  day: 02
PublicationDecade 2020
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2020
Publisher INRIA
Nicolas Turenne
Publisher_xml – name: INRIA
– name: Nicolas Turenne
SSID ssj0001548555
Score 2.1263764
Snippet The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for...
SourceID doaj
hal
crossref
SourceType Open Website
Open Access Repository
Index Database
SubjectTerms [info.info-cl]computer science [cs]/computation and language [cs.cl]
[info]computer science [cs]
[shs.langue]humanities and social sciences/linguistics
[shs.museo]humanities and social sciences/cultural heritage and museology
audio corpus
Computation and Language
Computer Science
Cultural heritage and museology
dictionary definition
digital resources
Humanities and Social Sciences
igbo
Linguistics
oral tradition
under-resourced languages
Title Spoken word corpus and dictionary definition for an African language
URI https://hal.science/hal-02912202
https://doaj.org/article/770e7681287f4e5ba317ae2412c95262
Volume Special Issue on Collecting,...
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxcDCG1FeslBXQ-LEsT0WKOoAFRIgdYsc21ELIq36Qiz8ds52gmBiYcngWHl8F-fus-6-Q6iTZs5tCkao1IykwpZEKauIkUxrHlOuI-2bTfDBQAyH8uFHqy-XExbkgQNwl5xHljuRLMHL1LJCgcNTFvwO1ZLR8PeFqOcHmQr1wU70xOUvwsyMMAiDQnFemlEpLl_MmxldZLxplVW7I6_aD05m1Gyqeidzu4026-gQd8NT7aA1W-2irabzAq4X4h66eZxOXm2F34E6YuCP0-Ucq8pgM_ZlCmr2gY0tx5XPx8IQl8JZHFoCVbjZotxHz7e9p-s-qfshEB3zJCGFZiqLgeEUIokTeEVH75QAI7BEi9Smkke85EamkS6M07HnMuMF18rFKTJNDlCrmlT2EGEIO4rIxsBQJazigiqjYsMTVUZKChOrNuo0sOTTIHuRA13w6OUevdyh10ZXDrLvKU6r2g-ABfPagvlfFmyjcwD81zX63bvcjUVUxpRGdEWP_uNOx2iDOrrsslHoCWotZkt7itb1ajGez878FwTH-8_eFzscxsU
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Spoken+word+corpus+and+dictionary+definition+for+an+African+language&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Nganga%2C+Wanjiku&rft.au=Achebe%2C+Ikechukwu&rft.date=2020-12-02&rft.pub=INRIA&rft.eissn=2416-5999&rft.volume=Special+Issue+on+Collecting%2C+Preserving%2C+and+Disseminating+Endangered+Cultural+Heritage+for+New+Understandings+through+Multilingual+Approaches&rft_id=info:doi/10.46298%2Fjdmdh.6703&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-02912202v2
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon