Enhancement of English-Bengali Machine Translation Leveraging Back-Translation.
Gespeichert in:
| Titel: | Enhancement of English-Bengali Machine Translation Leveraging Back-Translation. |
|---|---|
| Autoren: | Mondal, Subrota Kumar, Wang, Chengwei, Chen, Yijun, Cheng, Yuning, Huang, Yanbo, Dai, Hong-Ning, Kabir, H. M. Dipu |
| Quelle: | Applied Sciences (2076-3417); Aug2024, Vol. 14 Issue 15, p6848, 23p |
| Schlagwörter: | DISTRIBUTION (Probability theory), STATISTICAL sampling, MACHINE translating, CORPORA, TRANSLATING & interpreting, TRANSLATORS |
| Abstract: | An English-Bengali machine translation (MT) application can convert an English text into a corresponding Bengali translation. To build a better model for this task, we can optimize English-Bengali MT. MT for languages with rich resources, like English-German, started decades ago. However, MT for languages lacking many parallel corpora remains challenging. In our study, we employed back-translation to improve the translation accuracy. With back-translation, we can have a pseudo-parallel corpus, and the generated (pseudo) corpus can be added to the original dataset to obtain an augmented dataset. However, the new data can be regarded as noisy data because they are generated by models that may not be trained very well or not evaluated well, like human translators. Since the original output of a translation model is a probability distribution of candidate words, to make the model more robust, different decoding methods are used, such as beam search, top-k random sampling and random sampling with temperature T, and others. Notably, top-k random sampling and random sampling with temperature T are more commonly used and more optimal decoding methods than the beam search. To this end, our study compares LSTM (Long-Short Term Memory, as a baseline) and Transformer. Our results show that Transformer (BLEU: 27.80 in validation, 1.33 in test) outperforms LSTM ( 3.62 in validation, 0.00 in test) by a large margin in the English-Bengali translation task. (Evaluating LSTM and Transformer without any augmented data is our baseline study.) We also incorporate two decoding methods, top-k random sampling and random sampling with temperature T, for back-translation that help improve the translation accuracy of the model. The results show that data generated by back-translation without top-k or temperature sampling ("no strategy") help improve the accuracy (BLEU 38.22 , + 10.42 on validation, 2.07 , + 0.74 on test). Specifically, back-translation with top-k sampling is less effective ( k = 10 , BLEU 29.43 , + 1.83 on validation, 1.36 , + 0.03 on test), while sampling with a proper value of T, T = 0.5 makes the model achieve a higher score ( T = 0.5 , BLEU 35.02 , + 7.22 on validation, 2.35 , + 1.02 on test). This implies that in English-Bengali MT, we can augment the training set through back-translation using random sampling with a proper temperature T. [ABSTRACT FROM AUTHOR] |
| Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Datenbank: | Complementary Index |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=20763417&ISBN=&volume=14&issue=15&date=20240801&spage=6848&pages=6848-6870&title=Applied Sciences (2076-3417)&atitle=Enhancement%20of%20English-Bengali%20Machine%20Translation%20Leveraging%20Back-Translation.&aulast=Mondal%2C%20Subrota%20Kumar&id=DOI:10.3390/app14156848 Name: Full Text Finder Category: fullText Text: Full Text Finder Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif MouseOverText: Full Text Finder – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Mondal%20SK Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edb DbLabel: Complementary Index An: 178949821 RelevancyScore: 983 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 983.429809570313 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Enhancement of English-Bengali Machine Translation Leveraging Back-Translation. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Mondal%2C+Subrota+Kumar%22">Mondal, Subrota Kumar</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Chengwei%22">Wang, Chengwei</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Yijun%22">Chen, Yijun</searchLink><br /><searchLink fieldCode="AR" term="%22Cheng%2C+Yuning%22">Cheng, Yuning</searchLink><br /><searchLink fieldCode="AR" term="%22Huang%2C+Yanbo%22">Huang, Yanbo</searchLink><br /><searchLink fieldCode="AR" term="%22Dai%2C+Hong-Ning%22">Dai, Hong-Ning</searchLink><br /><searchLink fieldCode="AR" term="%22Kabir%2C+H%2E+M%2E+Dipu%22">Kabir, H. M. Dipu</searchLink> – Name: TitleSource Label: Source Group: Src Data: Applied Sciences (2076-3417); Aug2024, Vol. 14 Issue 15, p6848, 23p – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22DISTRIBUTION+%28Probability+theory%29%22">DISTRIBUTION (Probability theory)</searchLink><br /><searchLink fieldCode="DE" term="%22STATISTICAL+sampling%22">STATISTICAL sampling</searchLink><br /><searchLink fieldCode="DE" term="%22MACHINE+translating%22">MACHINE translating</searchLink><br /><searchLink fieldCode="DE" term="%22CORPORA%22">CORPORA</searchLink><br /><searchLink fieldCode="DE" term="%22TRANSLATING+%26+interpreting%22">TRANSLATING & interpreting</searchLink><br /><searchLink fieldCode="DE" term="%22TRANSLATORS%22">TRANSLATORS</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: An English-Bengali machine translation (MT) application can convert an English text into a corresponding Bengali translation. To build a better model for this task, we can optimize English-Bengali MT. MT for languages with rich resources, like English-German, started decades ago. However, MT for languages lacking many parallel corpora remains challenging. In our study, we employed back-translation to improve the translation accuracy. With back-translation, we can have a pseudo-parallel corpus, and the generated (pseudo) corpus can be added to the original dataset to obtain an augmented dataset. However, the new data can be regarded as noisy data because they are generated by models that may not be trained very well or not evaluated well, like human translators. Since the original output of a translation model is a probability distribution of candidate words, to make the model more robust, different decoding methods are used, such as beam search, top-k random sampling and random sampling with temperature T, and others. Notably, top-k random sampling and random sampling with temperature T are more commonly used and more optimal decoding methods than the beam search. To this end, our study compares LSTM (Long-Short Term Memory, as a baseline) and Transformer. Our results show that Transformer (BLEU: 27.80 in validation, 1.33 in test) outperforms LSTM ( 3.62 in validation, 0.00 in test) by a large margin in the English-Bengali translation task. (Evaluating LSTM and Transformer without any augmented data is our baseline study.) We also incorporate two decoding methods, top-k random sampling and random sampling with temperature T, for back-translation that help improve the translation accuracy of the model. The results show that data generated by back-translation without top-k or temperature sampling ("no strategy") help improve the accuracy (BLEU 38.22 , + 10.42 on validation, 2.07 , + 0.74 on test). Specifically, back-translation with top-k sampling is less effective ( k = 10 , BLEU 29.43 , + 1.83 on validation, 1.36 , + 0.03 on test), while sampling with a proper value of T, T = 0.5 makes the model achieve a higher score ( T = 0.5 , BLEU 35.02 , + 7.22 on validation, 2.35 , + 1.02 on test). This implies that in English-Bengali MT, we can augment the training set through back-translation using random sampling with a proper temperature T. [ABSTRACT FROM AUTHOR] – Name: Abstract Label: Group: Ab Data: <i>Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=178949821 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.3390/app14156848 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 23 StartPage: 6848 Subjects: – SubjectFull: DISTRIBUTION (Probability theory) Type: general – SubjectFull: STATISTICAL sampling Type: general – SubjectFull: MACHINE translating Type: general – SubjectFull: CORPORA Type: general – SubjectFull: TRANSLATING & interpreting Type: general – SubjectFull: TRANSLATORS Type: general Titles: – TitleFull: Enhancement of English-Bengali Machine Translation Leveraging Back-Translation. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Mondal, Subrota Kumar – PersonEntity: Name: NameFull: Wang, Chengwei – PersonEntity: Name: NameFull: Chen, Yijun – PersonEntity: Name: NameFull: Cheng, Yuning – PersonEntity: Name: NameFull: Huang, Yanbo – PersonEntity: Name: NameFull: Dai, Hong-Ning – PersonEntity: Name: NameFull: Kabir, H. M. Dipu IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 08 Text: Aug2024 Type: published Y: 2024 Identifiers: – Type: issn-print Value: 20763417 Numbering: – Type: volume Value: 14 – Type: issue Value: 15 Titles: – TitleFull: Applied Sciences (2076-3417) Type: main |
| ResultId | 1 |
Full Text Finder
Nájsť tento článok vo Web of Science