Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Machine translation Ročník 34; číslo 4; s. 251 - 286
Hlavní autori:	Grönroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Dordrecht Springer Netherlands 01.12.2020 Springer Nature B.V
Predmet:	Artificial Intelligence Asymmetry Computational Linguistics Computer Science Czech language Danish language Denoising English language Estonian language Experiments Finnish language Languages Learning Learning transfer Machine translation Monolingualism Natural Language Processing (NLP) Noise reduction Norwegian language Parallel corpora Pretraining Regularization Sami languages Sampling Segmentation Slovak language Swedish language Translation Translation methods and strategies Vocabulary Low-resource languages Subword segmentation Multilingual machine translation Denoising sequence autoencoder Transfer learning Multi-task learning
ISSN:	0922-6567, 1573-0573
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0922-6567 1573-0573
DOI:	10.1007/s10590-020-09253-x