Spoken word corpus and dictionary definition for an African language
The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of di...
Uložené v:
| Vydané v: | Journal of data mining and digital humanities Ročník Special Issue on Collecting,...; číslo Digital humanities in... |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
INRIA
02.12.2020
Nicolas Turenne |
| Predmet: | |
| ISSN: | 2416-5999, 2416-5999 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of dictionary making cannot rely on textual corpora as has been the standard practice in lexicography. This paper emphasizes the significance of the spoken word and the oral tradition as repositories of vocabulary, and argues that spoken word corpora greatly outweigh the value of printed texts for lexicography. We describe a methodology for creating a digital dialectal dictionary for the Igbo language from such a spoken word corpus. We also highlight the language technology tools and resources that have been created to support the transcription of thousands of hours of Igbo speech and the subsequent compilation of these transcriptions into an XML-encoded textual corpus of Igbo dialects. The methodology described in this paper can serve as a blueprint that can be adopted for other under-resourced languages that have predominantly oral cultures. |
|---|---|
| AbstractList | The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of dictionary making cannot rely on textual corpora as has been the standard practice in lexicography. This paper emphasizes the significance of the spoken word and the oral tradition as repositories of vocabulary, and argues that spoken word corpora greatly outweigh the value of printed texts for lexicography. We describe a methodology for creating a digital dialectal dictionary for the Igbo language from such a spoken word corpus. We also highlight the language technology tools and resources that have been created to support the transcription of thousands of hours of Igbo speech and the subsequent compilation of these transcriptions into an XML-encoded textual corpus of Igbo dialects. The methodology described in this paper can serve as a blueprint that can be adopted for other under-resourced languages that have predominantly oral cultures. |
| Author | Nganga, Wanjiku Achebe, Ikechukwu |
| Author_xml | – sequence: 1 givenname: Wanjiku surname: Nganga fullname: Nganga, Wanjiku organization: School of Computing & Informatics, University of Nairobi, Kenya – sequence: 2 givenname: Ikechukwu surname: Achebe fullname: Achebe, Ikechukwu organization: Nnamdi Azikiwe University |
| BackLink | https://hal.science/hal-02912202$$DView record in HAL |
| BookMark | eNpVUMtOwzAQtFCRKKUnfiBXhFL8zNrHqjxaqRIH4Gw5ttO6pHHltCD-nqRFCE6zOzs7Gs0lGjSx8QhdEzzhBVXybuO2bj0pALMzNKScFLlQSg3-zBdo3LYbjDERXAohhuj-ZRfffZN9xuQyG9Pu0GamcZkLdh9iY9JX5nwVmtBvWRVTd82mVQq2w9o0q4NZ-St0Xpm69eMfHKG3x4fX2TxfPj8tZtNlbgkwlpdWmIJIIKVkhHURsKTCSG-tYFZyzxVgqMApjm3pJHABqoASrMEFAcXZCC1Ovi6ajd6lsO3y6WiCPhIxrbRJ-2BrrwGwh0ISKqHiXpSGETC-K4JaJWhBO6-bk9fa1P-s5tOl7jlMFaEU049ee3vS2hTbNvnq94FgfexeH7vXfffsG9fwdio |
| ContentType | Journal Article |
| Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 1XC BXJBU IHQJB VOOES DOA |
| DOI | 10.46298/jdmdh.6703 |
| DatabaseName | CrossRef Hyper Article en Ligne (HAL) HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access) Hyper Article en Ligne (HAL) (Open Access) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2416-5999 |
| ExternalDocumentID | oai_doaj_org_article_770e7681287f4e5ba317ae2412c95262 oai:HAL:hal-02912202v2 10_46298_jdmdh_6703 |
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV ADQAK AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION FRP GROUPED_DOAJ KQ8 M~E OK1 1XC BXJBU IHQJB VOOES |
| ID | FETCH-LOGICAL-c1733-bc5a61871b83131540825a8ecc53c84e49707f7d940cbd87457967b7ca0617943 |
| IEDL.DBID | DOA |
| ISSN | 2416-5999 |
| IngestDate | Fri Oct 03 12:52:34 EDT 2025 Tue Oct 14 20:36:31 EDT 2025 Sat Nov 29 04:10:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | Digital humanities in... |
| Keywords | audio corpus under-resourced languages Igbo oral tradition dictionary definition digital resources |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1733-bc5a61871b83131540825a8ecc53c84e49707f7d940cbd87457967b7ca0617943 |
| OpenAccessLink | https://doaj.org/article/770e7681287f4e5ba317ae2412c95262 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_770e7681287f4e5ba317ae2412c95262 hal_primary_oai_HAL_hal_02912202v2 crossref_primary_10_46298_jdmdh_6703 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-12-02 |
| PublicationDateYYYYMMDD | 2020-12-02 |
| PublicationDate_xml | – month: 12 year: 2020 text: 2020-12-02 day: 02 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of data mining and digital humanities |
| PublicationYear | 2020 |
| Publisher | INRIA Nicolas Turenne |
| Publisher_xml | – name: INRIA – name: Nicolas Turenne |
| SSID | ssj0001548555 |
| Score | 2.1263764 |
| Snippet | The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for... |
| SourceID | doaj hal crossref |
| SourceType | Open Website Open Access Repository Index Database |
| SubjectTerms | [info.info-cl]computer science [cs]/computation and language [cs.cl] [info]computer science [cs] [shs.langue]humanities and social sciences/linguistics [shs.museo]humanities and social sciences/cultural heritage and museology audio corpus Computation and Language Computer Science Cultural heritage and museology dictionary definition digital resources Humanities and Social Sciences igbo Linguistics oral tradition under-resourced languages |
| Title | Spoken word corpus and dictionary definition for an African language |
| URI | https://hal.science/hal-02912202 https://doaj.org/article/770e7681287f4e5ba317ae2412c95262 |
| Volume | Special Issue on Collecting,... |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: DOA dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxcDCG1FeslBXQ-LEsT0WKOoAFRIgdYsc21ELIq36Qiz8ds52gmBiYcngWHl8F-fus-6-Q6iTZs5tCkao1IykwpZEKauIkUxrHlOuI-2bTfDBQAyH8uFHqy-XExbkgQNwl5xHljuRLMHL1LJCgcNTFvwO1ZLR8PeFqOcHmQr1wU70xOUvwsyMMAiDQnFemlEpLl_MmxldZLxplVW7I6_aD05m1Gyqeidzu4026-gQd8NT7aA1W-2irabzAq4X4h66eZxOXm2F34E6YuCP0-Ucq8pgM_ZlCmr2gY0tx5XPx8IQl8JZHFoCVbjZotxHz7e9p-s-qfshEB3zJCGFZiqLgeEUIokTeEVH75QAI7BEi9Smkke85EamkS6M07HnMuMF18rFKTJNDlCrmlT2EGEIO4rIxsBQJazigiqjYsMTVUZKChOrNuo0sOTTIHuRA13w6OUevdyh10ZXDrLvKU6r2g-ABfPagvlfFmyjcwD81zX63bvcjUVUxpRGdEWP_uNOx2iDOrrsslHoCWotZkt7itb1ajGez878FwTH-8_eFzscxsU |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Spoken+word+corpus+and+dictionary+definition+for+an+African+language&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Nganga%2C+Wanjiku&rft.au=Achebe%2C+Ikechukwu&rft.date=2020-12-02&rft.pub=INRIA&rft.eissn=2416-5999&rft.volume=Special+Issue+on+Collecting%2C+Preserving%2C+and+Disseminating+Endangered+Cultural+Heritage+for+New+Understandings+through+Multilingual+Approaches&rft_id=info:doi/10.46298%2Fjdmdh.6703&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-02912202v2 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon |