Enhancement of English-Bengali Machine Translation Leveraging Back-Translation.

Gespeichert in:
Bibliographische Detailangaben
Titel: Enhancement of English-Bengali Machine Translation Leveraging Back-Translation.
Autoren: Mondal, Subrota Kumar, Wang, Chengwei, Chen, Yijun, Cheng, Yuning, Huang, Yanbo, Dai, Hong-Ning, Kabir, H. M. Dipu
Quelle: Applied Sciences (2076-3417); Aug2024, Vol. 14 Issue 15, p6848, 23p
Schlagwörter: DISTRIBUTION (Probability theory), STATISTICAL sampling, MACHINE translating, CORPORA, TRANSLATING & interpreting, TRANSLATORS
Abstract: An English-Bengali machine translation (MT) application can convert an English text into a corresponding Bengali translation. To build a better model for this task, we can optimize English-Bengali MT. MT for languages with rich resources, like English-German, started decades ago. However, MT for languages lacking many parallel corpora remains challenging. In our study, we employed back-translation to improve the translation accuracy. With back-translation, we can have a pseudo-parallel corpus, and the generated (pseudo) corpus can be added to the original dataset to obtain an augmented dataset. However, the new data can be regarded as noisy data because they are generated by models that may not be trained very well or not evaluated well, like human translators. Since the original output of a translation model is a probability distribution of candidate words, to make the model more robust, different decoding methods are used, such as beam search, top-k random sampling and random sampling with temperature T, and others. Notably, top-k random sampling and random sampling with temperature T are more commonly used and more optimal decoding methods than the beam search. To this end, our study compares LSTM (Long-Short Term Memory, as a baseline) and Transformer. Our results show that Transformer (BLEU: 27.80 in validation, 1.33 in test) outperforms LSTM ( 3.62 in validation, 0.00 in test) by a large margin in the English-Bengali translation task. (Evaluating LSTM and Transformer without any augmented data is our baseline study.) We also incorporate two decoding methods, top-k random sampling and random sampling with temperature T, for back-translation that help improve the translation accuracy of the model. The results show that data generated by back-translation without top-k or temperature sampling ("no strategy") help improve the accuracy (BLEU 38.22 , + 10.42 on validation, 2.07 , + 0.74 on test). Specifically, back-translation with top-k sampling is less effective ( k = 10 , BLEU 29.43 , + 1.83 on validation, 1.36 , + 0.03 on test), while sampling with a proper value of T, T = 0.5 makes the model achieve a higher score ( T = 0.5 , BLEU 35.02 , + 7.22 on validation, 2.35 , + 1.02 on test). This implies that in English-Bengali MT, we can augment the training set through back-translation using random sampling with a proper temperature T. [ABSTRACT FROM AUTHOR]
Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=20763417&ISBN=&volume=14&issue=15&date=20240801&spage=6848&pages=6848-6870&title=Applied Sciences (2076-3417)&atitle=Enhancement%20of%20English-Bengali%20Machine%20Translation%20Leveraging%20Back-Translation.&aulast=Mondal%2C%20Subrota%20Kumar&id=DOI:10.3390/app14156848
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Mondal%20SK
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 178949821
RelevancyScore: 983
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 983.429809570313
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Enhancement of English-Bengali Machine Translation Leveraging Back-Translation.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Mondal%2C+Subrota+Kumar%22">Mondal, Subrota Kumar</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Chengwei%22">Wang, Chengwei</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Yijun%22">Chen, Yijun</searchLink><br /><searchLink fieldCode="AR" term="%22Cheng%2C+Yuning%22">Cheng, Yuning</searchLink><br /><searchLink fieldCode="AR" term="%22Huang%2C+Yanbo%22">Huang, Yanbo</searchLink><br /><searchLink fieldCode="AR" term="%22Dai%2C+Hong-Ning%22">Dai, Hong-Ning</searchLink><br /><searchLink fieldCode="AR" term="%22Kabir%2C+H%2E+M%2E+Dipu%22">Kabir, H. M. Dipu</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Applied Sciences (2076-3417); Aug2024, Vol. 14 Issue 15, p6848, 23p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22DISTRIBUTION+%28Probability+theory%29%22">DISTRIBUTION (Probability theory)</searchLink><br /><searchLink fieldCode="DE" term="%22STATISTICAL+sampling%22">STATISTICAL sampling</searchLink><br /><searchLink fieldCode="DE" term="%22MACHINE+translating%22">MACHINE translating</searchLink><br /><searchLink fieldCode="DE" term="%22CORPORA%22">CORPORA</searchLink><br /><searchLink fieldCode="DE" term="%22TRANSLATING+%26+interpreting%22">TRANSLATING & interpreting</searchLink><br /><searchLink fieldCode="DE" term="%22TRANSLATORS%22">TRANSLATORS</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: An English-Bengali machine translation (MT) application can convert an English text into a corresponding Bengali translation. To build a better model for this task, we can optimize English-Bengali MT. MT for languages with rich resources, like English-German, started decades ago. However, MT for languages lacking many parallel corpora remains challenging. In our study, we employed back-translation to improve the translation accuracy. With back-translation, we can have a pseudo-parallel corpus, and the generated (pseudo) corpus can be added to the original dataset to obtain an augmented dataset. However, the new data can be regarded as noisy data because they are generated by models that may not be trained very well or not evaluated well, like human translators. Since the original output of a translation model is a probability distribution of candidate words, to make the model more robust, different decoding methods are used, such as beam search, top-k random sampling and random sampling with temperature T, and others. Notably, top-k random sampling and random sampling with temperature T are more commonly used and more optimal decoding methods than the beam search. To this end, our study compares LSTM (Long-Short Term Memory, as a baseline) and Transformer. Our results show that Transformer (BLEU: 27.80 in validation, 1.33 in test) outperforms LSTM ( 3.62 in validation, 0.00 in test) by a large margin in the English-Bengali translation task. (Evaluating LSTM and Transformer without any augmented data is our baseline study.) We also incorporate two decoding methods, top-k random sampling and random sampling with temperature T, for back-translation that help improve the translation accuracy of the model. The results show that data generated by back-translation without top-k or temperature sampling ("no strategy") help improve the accuracy (BLEU 38.22 , + 10.42 on validation, 2.07 , + 0.74 on test). Specifically, back-translation with top-k sampling is less effective ( k = 10 , BLEU 29.43 , + 1.83 on validation, 1.36 , + 0.03 on test), while sampling with a proper value of T, T = 0.5 makes the model achieve a higher score ( T = 0.5 , BLEU 35.02 , + 7.22 on validation, 2.35 , + 1.02 on test). This implies that in English-Bengali MT, we can augment the training set through back-translation using random sampling with a proper temperature T. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=178949821
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.3390/app14156848
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 23
        StartPage: 6848
    Subjects:
      – SubjectFull: DISTRIBUTION (Probability theory)
        Type: general
      – SubjectFull: STATISTICAL sampling
        Type: general
      – SubjectFull: MACHINE translating
        Type: general
      – SubjectFull: CORPORA
        Type: general
      – SubjectFull: TRANSLATING & interpreting
        Type: general
      – SubjectFull: TRANSLATORS
        Type: general
    Titles:
      – TitleFull: Enhancement of English-Bengali Machine Translation Leveraging Back-Translation.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Mondal, Subrota Kumar
      – PersonEntity:
          Name:
            NameFull: Wang, Chengwei
      – PersonEntity:
          Name:
            NameFull: Chen, Yijun
      – PersonEntity:
          Name:
            NameFull: Cheng, Yuning
      – PersonEntity:
          Name:
            NameFull: Huang, Yanbo
      – PersonEntity:
          Name:
            NameFull: Dai, Hong-Ning
      – PersonEntity:
          Name:
            NameFull: Kabir, H. M. Dipu
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 08
              Text: Aug2024
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 20763417
          Numbering:
            – Type: volume
              Value: 14
            – Type: issue
              Value: 15
          Titles:
            – TitleFull: Applied Sciences (2076-3417)
              Type: main
ResultId 1