BPE and morphologically segmented phrase based statistical machine translation system for Indian languages to resource constrained language Bodo

Machine translation serves as a crucial tool for bridging the gap between languages and facilitating the exchange of knowledge. Nevertheless, developing machine translation systems for languages with limited resources poses a significant challenge. This article delves into the creation of a statisti...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Multimedia tools and applications Ročník 84; číslo 25; s. 29715 - 29732
Hlavní autoři: Narzary, Sanjib, Brahma, Maharaj, Nandi, Sukumar, Som, Bidisha
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.07.2025
Springer Nature B.V
Témata:
ISSN:1573-7721, 1380-7501, 1573-7721
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Machine translation serves as a crucial tool for bridging the gap between languages and facilitating the exchange of knowledge. Nevertheless, developing machine translation systems for languages with limited resources poses a significant challenge. This article delves into the creation of a statistical machine translation system for Bodo, a low-resource language spoken in Northeast India. It also details the challenges encountered during this undertaking. To evaluate the quality of our low-resource machine translation, we utilized parallel corpora collected and refined from Indo Wordnet for Bodo and the 17 Indic languages. To enhance translation quality and resolve out-of-vocabulary (OOV) issues, we incorporated byte pair encoding and a morphological analyzer into our translation system. Our efforts resulted in the development of an efficient translation system. With the use of byte pair encoding and morpheme segmentation, we were able to achieve a BLEU score of 18.35 from Hindi to Bodo and 16.91 from Nepali to Bodo, indicating a significant improvement in the quality of translations compared to baseline results. This work additionally investigates the potential of incorporating target-side monolingual data for enhancing machine translation quality.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-024-20277-w