BPE and morphologically segmented phrase based statistical machine translation system for Indian languages to resource constrained language Bodo

Machine translation serves as a crucial tool for bridging the gap between languages and facilitating the exchange of knowledge. Nevertheless, developing machine translation systems for languages with limited resources poses a significant challenge. This article delves into the creation of a statisti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications Jg. 84; H. 25; S. 29715 - 29732
Hauptverfasser: Narzary, Sanjib, Brahma, Maharaj, Nandi, Sukumar, Som, Bidisha
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.07.2025
Springer Nature B.V
Schlagworte:
ISSN:1573-7721, 1380-7501, 1573-7721
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Machine translation serves as a crucial tool for bridging the gap between languages and facilitating the exchange of knowledge. Nevertheless, developing machine translation systems for languages with limited resources poses a significant challenge. This article delves into the creation of a statistical machine translation system for Bodo, a low-resource language spoken in Northeast India. It also details the challenges encountered during this undertaking. To evaluate the quality of our low-resource machine translation, we utilized parallel corpora collected and refined from Indo Wordnet for Bodo and the 17 Indic languages. To enhance translation quality and resolve out-of-vocabulary (OOV) issues, we incorporated byte pair encoding and a morphological analyzer into our translation system. Our efforts resulted in the development of an efficient translation system. With the use of byte pair encoding and morpheme segmentation, we were able to achieve a BLEU score of 18.35 from Hindi to Bodo and 16.91 from Nepali to Bodo, indicating a significant improvement in the quality of translations compared to baseline results. This work additionally investigates the potential of incorporating target-side monolingual data for enhancing machine translation quality.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-024-20277-w