BPE and morphologically segmented phrase based statistical machine translation system for Indian languages to resource constrained language Bodo
Machine translation serves as a crucial tool for bridging the gap between languages and facilitating the exchange of knowledge. Nevertheless, developing machine translation systems for languages with limited resources poses a significant challenge. This article delves into the creation of a statisti...
Gespeichert in:
| Veröffentlicht in: | Multimedia tools and applications Jg. 84; H. 25; S. 29715 - 29732 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Springer US
01.07.2025
Springer Nature B.V |
| Schlagworte: | |
| ISSN: | 1573-7721, 1380-7501, 1573-7721 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Machine translation serves as a crucial tool for bridging the gap between languages and facilitating the exchange of knowledge. Nevertheless, developing machine translation systems for languages with limited resources poses a significant challenge. This article delves into the creation of a statistical machine translation system for Bodo, a low-resource language spoken in Northeast India. It also details the challenges encountered during this undertaking. To evaluate the quality of our low-resource machine translation, we utilized parallel corpora collected and refined from Indo Wordnet for Bodo and the 17 Indic languages. To enhance translation quality and resolve out-of-vocabulary (OOV) issues, we incorporated byte pair encoding and a morphological analyzer into our translation system. Our efforts resulted in the development of an efficient translation system. With the use of byte pair encoding and morpheme segmentation, we were able to achieve a BLEU score of 18.35 from Hindi to Bodo and 16.91 from Nepali to Bodo, indicating a significant improvement in the quality of translations compared to baseline results. This work additionally investigates the potential of incorporating target-side monolingual data for enhancing machine translation quality. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1573-7721 1380-7501 1573-7721 |
| DOI: | 10.1007/s11042-024-20277-w |