Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Machine translation Ročník 34; číslo 4; s. 251 - 286
Hlavní autori: Grönroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Dordrecht Springer Netherlands 01.12.2020
Springer Nature B.V
Predmet:
ISSN:0922-6567, 1573-0573
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.
AbstractList There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.
Author Virpioja, Sami
Grönroos, Stig-Arne
Kurimo, Mikko
Author_xml – sequence: 1
  givenname: Stig-Arne
  orcidid: 0000-0002-3750-6924
  surname: Grönroos
  fullname: Grönroos, Stig-Arne
  email: stig-arne.gronroos@aalto.fi
  organization: Department of Signal Processing and Acoustics, Aalto University
– sequence: 2
  givenname: Sami
  orcidid: 0000-0002-3568-150X
  surname: Virpioja
  fullname: Virpioja, Sami
  organization: Department of Digital Humanities, University of Helsinki, Utopia Analytics
– sequence: 3
  givenname: Mikko
  surname: Kurimo
  fullname: Kurimo, Mikko
  organization: Department of Signal Processing and Acoustics, Aalto University
BookMark eNp9kE1LAzEQhoNUsK3-AU8LnqOzyX40Ryl-QcFLPUqYzSZly25Sky22_95sVxA89JAMDPO88847IxPrrCbkNoX7FKB8CCnkAiiw-ATLOT1ckGmal5xC_CZkGruMFnlRXpFZCFuAiAGfks-1RxuM9kmr0dvGbhK0dRL21bfzsWK3a4emcT7BcOw63ftGUa-D23ulk2iD9o52aI-J1XuPbdIPii32jbPX5NJgG_TNb52Tj-en9fKVrt5f3paPK6p4wXuqdZWqgqEwQokKsWSZyEQNGeeAWZ0zjZUBUVW5LlWdL9LSLMoMwDCFNVcZn5O7UXfn3ddeh15uoz0bV8ooxTPBBAxTbJxS3oXgtZE733TojzIFOcQoxxhljFGeYpSHCC3-QarpT8fFO5v2PMpHNMQ9dqP9n6sz1A9dgIwh
CitedBy_id crossref_primary_10_3103_S1060992X23030049
crossref_primary_10_3390_s21093063
Cites_doi 10.3115/v1/P15-1162
10.1007/s10590-019-09243-8
10.18653/v1/W18-6321
10.1162/COLI_a_00050
10.18653/v1/P17-4012
10.1109/CVPR.2016.308
10.1162/089120101750300490
10.1109/ASRU.2013.6707697
10.1007/BF00332918
10.18653/v1/P16-1185
10.18653/v1/W19-5205
10.1111/j.2517-6161.1977.tb01600.x
10.1198/016214502753479464
10.3115/1073083.1073135
10.18653/v1/W19-5206
10.18653/v1/D16-1163
10.18653/v1/W18-6325
10.18653/v1/W17-4715
10.1016/S0079-7421(08)60536-8
10.18653/v1/D18-1100
10.3115/1626355.1626359
10.18653/v1/W18-6313
10.18653/v1/P16-1160
10.18653/v1/D16-1160
10.18653/v1/N16-1101
10.2307/411036
10.1145/1390156.1390294
10.18653/v1/D18-1045
10.1007/s10590-019-09228-7
10.18653/v1/D18-1549
10.18653/v1/N19-1190
10.18653/v1/D17-1039
10.18653/v1/P18-1007
10.18653/v1/D18-1398
10.18653/v1/D18-1039
10.1007/s10590-011-9090-0
10.1007/978-1-4615-5529-2_5
10.1162/tacl_a_00065
10.18653/v1/P16-1009
10.18653/v1/D18-2012
10.18653/v1/P17-2061
10.1145/1187415.1187418
10.1162/tacl_a_00017
10.18653/v1/P18-1005
10.3115/1118647.1118650
10.1007/s10590-017-9203-5
10.18653/v1/W18-6401
10.18653/v1/E17-1100
10.18653/v1/D18-1461
10.3115/1613984.1613999
10.18653/v1/W15-3049
10.18653/v1/W18-6410
10.18653/v1/P16-2058
10.1609/aaai.v31i1.10950
10.1162/tacl_a_00300
10.18653/v1/P19-1021
10.18653/v1/D17-1158
10.18653/v1/W17-4123
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021
The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021
– notice: The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021.
DBID AAYXX
CITATION
7T9
DOI 10.1007/s10590-020-09253-x
DatabaseName CrossRef
Linguistics and Language Behavior Abstracts (LLBA)
DatabaseTitle CrossRef
Linguistics and Language Behavior Abstracts (LLBA)
DatabaseTitleList Linguistics and Language Behavior Abstracts (LLBA)

DeliveryMethod fulltext_linktorsrc
Discipline Languages & Literatures
Computer Science
EISSN 1573-0573
EndPage 286
ExternalDocumentID 10_1007_s10590_020_09253_x
GrantInformation_xml – fundername: European Research Council
  grantid: 771113
– fundername: European Union’s Horizon 2020 Research and Innovation Programme
  grantid: 780069
GroupedDBID -4Z
-5C
-5G
-BR
-D4
-EM
-Y2
-~C
.86
.DC
.VR
06D
0VY
1N0
1SB
2.D
203
28-
29M
2J2
2JY
2KG
2LR
2P1
2VQ
2~H
30V
4.4
408
409
40D
40E
5GY
5QI
5VS
67Z
6IK
6NX
77K
8TC
95-
95.
95~
96X
AAAVM
AABHQ
AAHCP
AAHNG
AAIAL
AAJKR
AAOBN
AARHV
AARTL
AATVU
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABBBX
ABBHK
ABBXA
ABDZT
ABECU
ABECW
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMOR
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTMW
ABXPI
ABXSQ
ACBXY
ACGFO
ACGFS
ACHXU
ACKNC
ACMLO
ACNXV
ACOKC
ACOMO
ACREN
ACSNA
ACYUM
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADIYS
ADKNI
ADKPE
ADMHG
ADPTO
ADRFC
ADULT
ADURQ
ADYFF
ADYOE
ADZKW
AEBTG
AEFIE
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEOHA
AEPYU
AETLH
AEUPB
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGQMX
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHEXP
AHKAY
AHSBF
AHYZX
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMTXH
AMYQR
AOCGG
ARMRJ
AXYYD
AYJHY
AZFZN
AZRUE
B-.
BA0
BBWZM
BDATZ
BGNMA
BHNFS
CAG
COF
CS3
CSCUP
DL5
DNIVK
DU5
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GPZZG
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IPSME
IXC
IXD
IXE
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JAAYA
JAB
JBMMH
JBSCW
JCJTX
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JPL
JSODD
JST
KDC
KOV
KOW
KZ1
LAK
M4Y
MA-
MVM
N2Q
N9A
NB0
NDZJH
NQJWS
NU0
O-J
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9Q
PF0
PT5
QOK
QOS
R-Y
R4E
R89
R9I
RHV
RIG
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SA0
SAP
SCJ
SCLPG
SDH
SDM
SHS
SHX
SISQX
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK6
WK8
YLTOR
Z45
Z7X
Z81
Z83
Z88
Z8R
Z8U
Z8W
Z92
ZMTXR
~EX
77I
AAYXX
ABFSG
ACSTC
ADHKG
AEZWR
AFHIU
AGQPQ
AHWEU
AIXLP
CITATION
EBLON
JZLTJ
7T9
ID FETCH-LOGICAL-c363t-eeb1c62a9f9c9baa724949d04330a4d52eabf09bb5e7cd5817f87400f2cad3c43
IEDL.DBID RSV
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000613039800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0922-6567
IngestDate Sat Nov 08 22:13:55 EST 2025
Tue Nov 18 20:12:49 EST 2025
Sat Nov 29 03:00:09 EST 2025
Fri Feb 21 02:49:35 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Low-resource languages
Subword segmentation
Multilingual machine translation
Denoising sequence autoencoder
Transfer learning
Multi-task learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c363t-eeb1c62a9f9c9baa724949d04330a4d52eabf09bb5e7cd5817f87400f2cad3c43
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-3750-6924
0000-0002-3568-150X
OpenAccessLink https://aaltodoc.aalto.fi/handle/123456789/102739
PQID 2493492904
PQPubID 2043595
PageCount 36
ParticipantIDs proquest_journals_2493492904
crossref_primary_10_1007_s10590_020_09253_x
crossref_citationtrail_10_1007_s10590_020_09253_x
springer_journals_10_1007_s10590_020_09253_x
PublicationCentury 2000
PublicationDate 2020-12-01
PublicationDateYYYYMMDD 2020-12-01
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-12-01
  day: 01
PublicationDecade 2020
PublicationPlace Dordrecht
PublicationPlace_xml – name: Dordrecht
PublicationTitle Machine translation
PublicationTitleAbbrev Machine Translation
PublicationYear 2020
Publisher Springer Netherlands
Springer Nature B.V
Publisher_xml – name: Springer Netherlands
– name: Springer Nature B.V
References Östling R, Tiedemann J (2017) Neural machine translation for low-resource languages. ArXiv:1708.05729 [cs.CL], arXiv:1708.05729
ForcadaMLGinestí-RosellMNordfalkJO’ReganJOrtiz-RojasSPérez-OrtizJASánchez-MartínezFRamírez-SánchezGTyersFMApertium: a free/open-source platform for rule-based machine translationMach Transl201125212714410.1007/s10590-011-9090-0
Skorokhodov I, Rykachevskiy A, Emelyanenko D, Slotin S, Ponkratov A (2018) Semi-supervised neural machine translation with language models. In: Proceedings of the AMTA 2018 workshop on technologies for MT of low resource languages (LoResMT 2018), pp 37–44
Luong MT (2016) Neural machine translation. PhD Thesis, Stanford University
Virpioja S, Turunen VT, Spiegler S, Kohonen O, Kurimo M (2011) Empirical comparison of evaluation methods for unsupervised learning of morphology. Traitement Automatique des Langues 52(2):45–90, http://www.atala.org/Empirical-Comparison-of-Evaluation
Grönroos SA, Virpioja S, Kurimo M (2020) Morfessor EM+Prune: improved subword segmentation with expectation maximization and pruning. In: Proceedings of the 12th language resources and evaluation conference, ELRA, Marseilles
Arivazhagan N, Bapna A, Firat O, Lepikhin D, Johnson M, Krikun M, Chen MX, Cao Y, Foster G, Cherry C, Macherey W, Chen Z, Wu Y (2019) Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv:1907.05019 [cs.CL]
Sennrich R, Zhang B (2019) Revisiting low-resource neural machine translation: a case study. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 211–221
Mueller A, Nicolai G, McCarthy AD, Lewis D, Wu W, Yarowsky D (2020) An analysis of massively multilingual neural machine translation for low-resource languages. In: Proceedings of The 12th language resources and evaluation conference, pp 3710–3718
Sennrich R, Haddow B, Birch A (2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 86–96
Vaibhav V, Singh S, Stewart C, Neubig G (2019) Improving robustness of machine translation with synthetic noise. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 1916–1920, https://www.aclweb.org/anthology/N19-1190
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 6000–6010, http://arxiv.org/abs/1706.03762
Artetxe M, Labaka G, Agirre E, Cho K (2018) Unsupervised neural machine translation. In: Proceedings of the 6th international conference on learning representations (ICLR), http://arxiv.org/abs/1710.11041
Cherry C, Foster G, Bapna A, Firat O, Macherey W (2018) Revisiting character-based neural machine translation with capacity and compression. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Brussels, Belgium, pp 4295–4305, https://doi.org/10.18653/v1/D18-1461, https://www.aclweb.org/anthology/D18-1461
Lample G, Conneau A, Denoyer L, Ranzato M (2018a) Unsupervised machine translation using monolingual corpora only. In: International conference on learning representations (ICLR), http://arxiv.org/abs/1711.00043
KiperwasserEBallesterosMScheduled multi-task learning: from syntax to translationTrans Assoc Comput Linguist2018622524010.1162/tacl_a_00017
Conneau A, Lample G (2019) Cross-lingual language model pretraining. In: Proceedings of advances in neural information processing systems (NIPS), pp 7059–7069, http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining
Luong MT, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. In: Proceedings of international conference on learning representations (ICLR), http://arxiv.org/abs/1511.06114
CreutzMLagusKUnsupervised models for morpheme segmentation and morphology learningACM Trans Speech Lang Process20074140510.1145/1187415.1187418
Caswell I, Chelba C, Grangier D (2019) Tagged back-translation. In: Proceedings of the fourth conference on machine translation (WMT) (Volume 1: Research Papers), pp 53–63
BourlardHKampYAuto-association by multilayer perceptrons and singular value decompositionBiol Cybern1988594–529129496112010.1007/BF00332918
Dabre R, Chu C, Kunchukuttan A (2020) A comprehensive survey of multilingual neural machine translation. ArXiv:2001.01115 [cs.CL], arXiv:2001.01115
Kurimo M, Virpioja S, Turunen V, Lagus K (2010) Morpho challenge 2005-2010: Evaluations and results. In: Heinz J, Cahill L, Wicentowski R (eds) Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology, association for computational linguistics, Uppsala, Sweden, pp 87–95
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186, http://arxiv.org/abs/1810.04805
Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Technical Report A81, Publications in Computer and Information Science, Publications in Computer and Information Science, Helsinki University of Technology
Kudo T (2018) Subword regularization: Improving neural network translation models with multiple subword candidates. In: Proceedings of the 56th annual meeting of the association for computational linguistics (ACL) (Volume 1: Long Papers), pp 66–75, http://arxiv.org/abs/1804.10959
Di Gangi MA, Federico M (2017) Monolingual embeddings for low resourced neural machine translation. In: Proceedings of the 14th international workshop on spoken language translation (IWSLT’17), pp 97–104
Costa-jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL) (Volume 2: Short Papers), Association for Computational Linguistics, Berlin, Germany, pp 357–361, https://doi.org/10.18653/v1/P16-2058, https://www.aclweb.org/anthology/P16-2058
KoponenMSalmiLNikulinMA product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation outputMach Transl2019331–2619010.1007/s10590-019-09228-7
Blackwood G, Ballesteros M, Ward T (2018) Multilingual neural machine translation with task-specific attention. In: Proceedings of the 27th international conference on computational linguistics, pp 3112–3122
Lee YS (2004) Morphological analysis for statistical machine translation. In: Proceedings of HLT-NAACL 2004: short papers, Association for Computational Linguistics, pp 57–60
Currey A, Miceli-Barone AV, Heafield K (2017) Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the second conference on machine translation (WMT), pp 148–156
Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Koehn P, Monz C (2018) Findings of the 2018 conference on machine translation (wmt18). In: Proceedings of the third conference on machine translation, volume 2: shared task papers, association for computational linguistics, Belgium, Brussels, pp 272–307. http://www.aclweb.org/anthology/W18-6401
Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (acl) (volume 1: long papers), pp 1965–1974, http://arxiv.org/abs/1606.04596
GoldsmithJUnsupervised learning of the morphology of a natural languageComput Linguist2001272153198184150810.1162/089120101750300490
Lison P, Tiedemann J (2016) OpenSubtitles2016: Extracting large parallel corpora from movie and tv subtitles. In: Proceedings of the 10th international conference on language resources and evaluation (LREC 2016), European Language Resources Association
Oflazer K, El-Kahlout ID (2007) Exploring different representational units in English-to-Turkish statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Association for Computational Linguistics, pp 25–32, https://doi.org/10.3115/1626355.1626359, http://portal.acm.org/citation.cfm?doid=1626355.1626359
Song K, Tan X, Qin T, Lu J, Liu TY (2019) Mass: Masked sequence to sequence pre-training for language generation. In: Proceedings of the international conference on machine learning (ICML), pp 5926–5936
DempsterAPLairdNMRubinDBMaximum likelihood from incomplete data via the EM algorithmJ R Stat Soc19773911385015370364.62022
Wang X, Pham H, Dai Z, Neubig G (2018) Switchout: an efficient data augmentation algorithm for neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 856–861, https://www.aclweb.org/anthology/D18-1100
HarrisZSFrom phoneme to morphemeLanguage195531219022210.2307/411036
Firat O, Cho K, Bengio Y (2016) Multi-way, multilingual neural machine translation with a shared attention mechanism. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 866–875, http://arxiv.org/abs/1601.01073
Kohonen O, Virpioja S, Lagus K (2010) Semi-supervised learning of concatenative morphology. In: Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology, association for computational linguistics, Uppsala, Sweden, pp 78–86, http://www.aclweb.org/anthology/W10-2210
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) BART: Denoising sequence-to-sequen
M Koponen (9253_CR55) 2019; 33
9253_CR73
9253_CR72
9253_CR71
9253_CR70
9253_CR76
9253_CR75
9253_CR79
9253_CR78
M Creutz (9253_CR20) 2007; 4
9253_CR80
9253_CR83
9253_CR82
N Srivastava (9253_CR84) 2014; 15
9253_CR81
9253_CR88
9253_CR87
9253_CR86
9253_CR85
9253_CR89
A Karakanta (9253_CR48) 2018; 32
9253_CR51
9253_CR50
SL Scott (9253_CR77) 2002; 97
9253_CR54
9253_CR52
9253_CR5
9253_CR59
9253_CR6
9253_CR58
9253_CR3
9253_CR57
9253_CR4
9253_CR56
9253_CR1
9253_CR2
AP Dempster (9253_CR25) 1977; 39
R Caruana (9253_CR8) 1998
9253_CR9
J Rissanen (9253_CR74) 1989
9253_CR62
9253_CR61
9253_CR60
H Hammarström (9253_CR41) 2011; 37
9253_CR66
9253_CR65
9253_CR64
9253_CR63
9253_CR104
9253_CR69
9253_CR105
9253_CR68
9253_CR102
9253_CR67
9253_CR103
9253_CR100
9253_CR101
ZS Harris (9253_CR42) 1955; 31
P Gage (9253_CR32) 1994; 12
M Johnson (9253_CR45) 2017; 5
ML Forcada (9253_CR31) 2011; 25
9253_CR33
9253_CR30
9253_CR37
9253_CR36
9253_CR35
9253_CR39
9253_CR38
9253_CR40
9253_CR44
9253_CR43
9253_CR47
9253_CR91
9253_CR90
9253_CR95
9253_CR94
9253_CR93
M Joshi (9253_CR46) 2020; 8
9253_CR92
9253_CR11
E Kiperwasser (9253_CR49) 2018; 6
9253_CR99
9253_CR10
9253_CR98
9253_CR97
9253_CR96
9253_CR15
9253_CR14
9253_CR13
9253_CR12
9253_CR19
9253_CR18
9253_CR17
9253_CR16
P Koehn (9253_CR53) 2005; 5
H Bourlard (9253_CR7) 1988; 59
9253_CR22
9253_CR21
J Goldsmith (9253_CR34) 2001; 27
9253_CR26
9253_CR24
9253_CR23
9253_CR29
9253_CR28
9253_CR27
References_xml – reference: Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Proceedings of advances in neural information processing systems (NIPS), pp 3079–3087
– reference: JoshiMChenDLiuYWeldDSZettlemoyerLLevyOSpanbert: Improving pre-training by representing and predicting spansTrans Assoc Comput Linguist20208647710.1162/tacl_a_00300
– reference: Graça M, Kim Y, Schamper J, Khadivi S, Ney H (2019) Generalizing back-translation in neural machine translation. In: Proceedings of the fourth conference on machine translation (volume 1: research papers), pp 45–52, https://arxiv.org/abs/1906.07286
– reference: Costa-jussà MR, Escolano C, Fonollosa JAR (2017) Byte-based neural machine translation. In: Proceedings of the first workshop on subword and character level models in NLP, Association for Computational Linguistics, Copenhagen, Denmark, pp 154–158, https://doi.org/10.18653/v1/W17-4123, https://www.aclweb.org/anthology/W17-4123
– reference: Currey A, Miceli-Barone AV, Heafield K (2017) Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the second conference on machine translation (WMT), pp 148–156
– reference: Domhan T, Hieber F (2017) Using target-side monolingual data for neural machine translation through multi-task learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), pp 1500–1505
– reference: Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186, http://arxiv.org/abs/1810.04805
– reference: Ramachandran P, Liu PJ, Le Q (2017) Unsupervised pretraining for sequence to sequence learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), pp 383–391
– reference: Yang Z, Chen W, Wang F, Xu B (2018) Unsupervised neural machine translation with weight sharing. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 46–55
– reference: Gu J, Wang Y, Chen Y, Cho K, Li VOK (2018) Meta-learning for low-resource neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 3622–3631, http://arxiv.org/abs/1808.08437
– reference: Caswell I, Chelba C, Grangier D (2019) Tagged back-translation. In: Proceedings of the fourth conference on machine translation (WMT) (Volume 1: Research Papers), pp 53–63
– reference: Kreutzer J, Sokolov A (2018) Learning to segment inputs for NMT favors character-level processing. In: Proceedings of the 15th international workshop on spoken language translation (IWSLT), https://arxiv.org/abs/1810.01480
– reference: CreutzMLagusKUnsupervised models for morpheme segmentation and morphology learningACM Trans Speech Lang Process20074140510.1145/1187415.1187418
– reference: Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics (EACL): Volume 1, Long Papers, pp 1063–1073, https://www.aclweb.org/anthology/E17-1100
– reference: Tiedemann J (2009) Character-based PSMT for closely related languages. In: Proceedings of the 13th conference of the european association for machine translation (EAMT 2009), pp 12–19
– reference: Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018b) Phrase-based & neural unsupervised machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 5039–5049, https://www.aclweb.org/anthology/D18-1549.pdf
– reference: Sennrich R, Zhang B (2019) Revisiting low-resource neural machine translation: a case study. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 211–221
– reference: Luong MT, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. In: Proceedings of international conference on learning representations (ICLR), http://arxiv.org/abs/1511.06114
– reference: Blackwood G, Ballesteros M, Ward T (2018) Multilingual neural machine translation with task-specific attention. In: Proceedings of the 27th international conference on computational linguistics, pp 3112–3122
– reference: Kocmi T (2019) Exploring benefits of transfer learning in neural machine translation. PhD Thesis, Charles University
– reference: McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of learning and motivation, vol 24, Elsevier, pp 109–165
– reference: Dabre R, Nakagawa T, Kazawa H (2017) An empirical study of language relatedness for transfer learning in neural machine translation. In: Proceedings of the 31st Pacific Asia conference on language, information and computation, pp 282–286
– reference: Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Technical Report A81, Publications in Computer and Information Science, Publications in Computer and Information Science, Helsinki University of Technology
– reference: Skorokhodov I, Rykachevskiy A, Emelyanenko D, Slotin S, Ponkratov A (2018) Semi-supervised neural machine translation with language models. In: Proceedings of the AMTA 2018 workshop on technologies for MT of low resource languages (LoResMT 2018), pp 37–44
– reference: Wang X, Pham H, Dai Z, Neubig G (2018) Switchout: an efficient data augmentation algorithm for neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 856–861, https://www.aclweb.org/anthology/D18-1100
– reference: Torrey L, Shavlik J (2009) Transfer learning. In: Olivas ES (ed) Handbook of research on machine learning applications and trends: algorithms, methods, and techniques: algorithms, methods, and techniques, IGI Global, pp 242–264
– reference: Dabre R, Chu C, Kunchukuttan A (2020) A comprehensive survey of multilingual neural machine translation. ArXiv:2001.01115 [cs.CL], arXiv:2001.01115
– reference: Cherry C, Foster G, Bapna A, Firat O, Macherey W (2018) Revisiting character-based neural machine translation with capacity and compression. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Brussels, Belgium, pp 4295–4305, https://doi.org/10.18653/v1/D18-1461, https://www.aclweb.org/anthology/D18-1461
– reference: Kocmi T, Bojar O (2018) Trivial transfer learning for low-resource neural machine translation. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 244–252
– reference: He D, Xia Y, Qin T, Wang L, Yu N, Liu TY, Ma WY (2016) Dual learning for machine translation. In: Proceedings of advances in neural information processing systems (NIPS), pp 820–828, http://arxiv.org/abs/1611.00179
– reference: Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of advances in neural information processing systems (NIPS), pp 3104–3112
– reference: KiperwasserEBallesterosMScheduled multi-task learning: from syntax to translationTrans Assoc Comput Linguist2018622524010.1162/tacl_a_00017
– reference: JohnsonMSchusterMLeQVKrikunMWuYChenZThoratNViégasFWattenbergMCorradoGHughesMDeanJGoogle’s multilingual neural machine translation system: enabling zero-shot translationTrans Assoc Comput Linguist2017533935110.1162/tacl_a_00065
– reference: Kohonen O, Virpioja S, Lagus K (2010) Semi-supervised learning of concatenative morphology. In: Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology, association for computational linguistics, Uppsala, Sweden, pp 78–86, http://www.aclweb.org/anthology/W10-2210
– reference: Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the 10th workshop on statistical machine translation (WMT), association for computational linguistics, pp 392–395, https://doi.org/10.18653/v1/W15-3049, http://aclweb.org/anthology/W15-3049
– reference: Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 6000–6010, http://arxiv.org/abs/1706.03762
– reference: Platanios EA, Sachan M, Neubig G, Mitchell T (2018) Contextual parameter generation for universal neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 425–435, https://www.aclweb.org/anthology/D18-1039
– reference: Iyyer M, Manjunatha V, Boyd-Graber J, Daumé III H (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (ACL-IJCNLP) (Volume 1: Long Papers), pp 1681–1691
– reference: Varjokallio M, Kurimo M, Virpioja S (2013) Learning a subword vocabulary based on unigram likelihood. In: Proceedings of the 2013 IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE, Olomouc, Czech Republic, pp 7–12, https://doi.org/10.1109/ASRU.2013.6707697, http://ieeexplore.ieee.org/document/6707697/
– reference: Virpioja S, Smit P, Grönroos SA, Kurimo M (2013) Morfessor 2.0: Python implementation and extensions for Morfessor Baseline. Report 25/2013 in Aalto University publication series SCIENCE + TECHNOLOGY, Department of Signal Processing and Acoustics, Aalto University, Helsinki, Finland
– reference: Östling R, Tiedemann J (2017) Neural machine translation for low-resource languages. ArXiv:1708.05729 [cs.CL], arXiv:1708.05729
– reference: Belinkov Y, Bisk Y (2017) Synthetic and natural noise both break neural machine translation. arXiv:1711.02173 [cs.CL]
– reference: Oflazer K, El-Kahlout ID (2007) Exploring different representational units in English-to-Turkish statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Association for Computational Linguistics, pp 25–32, https://doi.org/10.3115/1626355.1626359, http://portal.acm.org/citation.cfm?doid=1626355.1626359
– reference: Artetxe M, Labaka G, Agirre E, Cho K (2018) Unsupervised neural machine translation. In: Proceedings of the 6th international conference on learning representations (ICLR), http://arxiv.org/abs/1710.11041
– reference: HarrisZSFrom phoneme to morphemeLanguage195531219022210.2307/411036
– reference: Stahlberg F, Cross J, Stoyanov V (2018) Simple fusion: Return of the language model. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 204–211
– reference: Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufiş D, Varga D (2006) The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the 5th international conference on language resources and evaluation (LREC’2006), Genoa, Italy
– reference: ScottSLBayesian methods for hidden Markov models: recursive computing in the 21st centuryJ Am Stat Assoc200297457337351196339310.1198/016214502753479464
– reference: Di Gangi MA, Federico M (2017) Monolingual embeddings for low resourced neural machine translation. In: Proceedings of the 14th international workshop on spoken language translation (IWSLT’17), pp 97–104
– reference: CaruanaRMultitask learningLearning to learn1998BerlinSpringer9513310.1007/978-1-4615-5529-2_5
– reference: Mueller A, Nicolai G, McCarthy AD, Lewis D, Wu W, Yarowsky D (2020) An analysis of massively multilingual neural machine translation for low-resource languages. In: Proceedings of The 12th language resources and evaluation conference, pp 3710–3718
– reference: Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (acl) (volume 1: long papers), pp 1965–1974, http://arxiv.org/abs/1606.04596
– reference: Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826, http://arxiv.org/abs/1512.00567
– reference: Sennrich R, Haddow B, Birch A (2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 86–96
– reference: Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) OpenNMT: Open-source toolkit for neural machine translation. In: Proceedings of the annual meeting of the association for computational linguistics (ACL). https://doi.org/10.18653/v1/P17-4012,arXiv: 1701.02810
– reference: Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP), pp 1700–1709
– reference: Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Koehn P, Monz C (2018) Findings of the 2018 conference on machine translation (wmt18). In: Proceedings of the third conference on machine translation, volume 2: shared task papers, association for computational linguistics, Belgium, Brussels, pp 272–307. http://www.aclweb.org/anthology/W18-6401
– reference: RissanenJStochastic complexity in statistical inquiry1989SingaporeWorld Scientific Series in Computer Science0800.68508
– reference: Creutz M, Lagus K (2002) Unsupervised discovery of morphemes. In: Proceedings of the ACL-02 workshop on morphological and phonological learning (MPL), Association for Computational Linguistics, Philadelphia, Pennsylvania, vol 6, pp 21–30, https://doi.org/10.3115/1118647.1118650, http://portal.acm.org/citation.cfm?doid=1118647.1118650
– reference: GagePA new algorithm for data compressionC Users J19941222338
– reference: Galuščáková P, Bojar O (2012) WMT 2011 testing set. http://hdl.handle.net/11858/00-097C-0000-0006-AADA-9, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
– reference: DempsterAPLairdNMRubinDBMaximum likelihood from incomplete data via the EM algorithmJ R Stat Soc19773911385015370364.62022
– reference: Grönroos SA, Virpioja S, Kurimo M (2018) Cognate-aware morphological segmentation for multilingual neural translation. In: Proceedings of the third conference on machine translation, Association for Computational Linguistics, Brussels
– reference: Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-translation at scale. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 489–500
– reference: GoldsmithJUnsupervised learning of the morphology of a natural languageComput Linguist2001272153198184150810.1162/089120101750300490
– reference: Sachan DS, Neubig G (2018) Parameter sharing methods for multilingual self-attentional translation models. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 261–271, https://www.aclweb.org/anthology/W18-6327
– reference: Sriram A, Jun H, Satheesh S, Coates A (2017) Cold fusion: Training seq2seq models together with language models. In: Proceedings of the Interspeech 2018, https://arxiv.org/abs/1708.06426
– reference: Virpioja S, Turunen VT, Spiegler S, Kohonen O, Kurimo M (2011) Empirical comparison of evaluation methods for unsupervised learning of morphology. Traitement Automatique des Langues 52(2):45–90, http://www.atala.org/Empirical-Comparison-of-Evaluation
– reference: ForcadaMLGinestí-RosellMNordfalkJO’ReganJOrtiz-RojasSPérez-OrtizJASánchez-MartínezFRamírez-SánchezGTyersFMApertium: a free/open-source platform for rule-based machine translationMach Transl201125212714410.1007/s10590-011-9090-0
– reference: KarakantaADehdariJvan GenabithJNeural machine translation for low-resource languages without parallel corporaMach Transl2018321–216718910.1007/s10590-017-9203-5
– reference: Song K, Tan X, Qin T, Lu J, Liu TY (2019) Mass: Masked sequence to sequence pre-training for language generation. In: Proceedings of the international conference on machine learning (ICML), pp 5926–5936
– reference: Costa-jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL) (Volume 2: Short Papers), Association for Computational Linguistics, Berlin, Germany, pp 357–361, https://doi.org/10.18653/v1/P16-2058, https://www.aclweb.org/anthology/P16-2058
– reference: Firat O, Cho K, Bengio Y (2016) Multi-way, multilingual neural machine translation with a shared attention mechanism. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 866–875, http://arxiv.org/abs/1601.01073
– reference: Virpioja S, Väyrynen JJ, Creutz M, Sadeniemi M (2007) Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. Machine Translation Summit XI, Copenhagen, Denmark 2007:491–498
– reference: Chu C, Dabre R, Kurohashi S (2017) An empirical comparison of domain adaptation methods for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (ACL) (Volume 2: Short Papers), Association for Computational Linguistics, Vancouver, pp 385–391, https://doi.org/10.18653/v1/P17-2061, https://www.aclweb.org/anthology/P17-2061
– reference: Salesky E, Runge A, Coda A, Niehues J, Neubig G (2020) Optimizing segmentation granularity for neural machine translation. Mach Transl pp 1–19
– reference: Luong MT (2016) Neural machine translation. PhD Thesis, Stanford University
– reference: SrivastavaNHintonGKrizhevskyASutskeverISalakhutdinovRDropout: a simple way to prevent neural networks from overfittingJ Mach Learn Res20141511929195832315921318.68153
– reference: Kurimo M, Virpioja S, Turunen V, Lagus K (2010) Morpho challenge 2005-2010: Evaluations and results. In: Heinz J, Cahill L, Wicentowski R (eds) Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology, association for computational linguistics, Uppsala, Sweden, pp 87–95
– reference: Gulcehre C, Firat O, Xu K, Cho K, Barrault L, Lin HC, Bougares F, Schwenk H, Bengio Y (2015) On using monolingual corpora in neural machine translation. http://arxiv.org/abs/1503.03535
– reference: Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), http://arxiv.org/abs/1508.07909
– reference: Grönroos SA, Virpioja S, Kurimo M (2020) Morfessor EM+Prune: improved subword segmentation with expectation maximization and pruning. In: Proceedings of the 12th language resources and evaluation conference, ELRA, Marseilles
– reference: Kudo T (2018) Subword regularization: Improving neural network translation models with multiple subword candidates. In: Proceedings of the 56th annual meeting of the association for computational linguistics (ACL) (Volume 1: Long Papers), pp 66–75, http://arxiv.org/abs/1804.10959
– reference: BourlardHKampYAuto-association by multilayer perceptrons and singular value decompositionBiol Cybern1988594–529129496112010.1007/BF00332918
– reference: Lison P, Tiedemann J (2016) OpenSubtitles2016: Extracting large parallel corpora from movie and tv subtitles. In: Proceedings of the 10th international conference on language resources and evaluation (LREC 2016), European Language Resources Association
– reference: KoehnPEuroparl: a parallel corpus for statistical machine translationMT Summit200557986
– reference: Bojar O, Dušek O, Kocmi T, Libovický J, Novák M, Popel M, Sudarikov R, Variš D (2016) CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered. In: Sojka P, Horák A, Kopeček I, Pala K (eds) Text, speech, and dialogue: 19th international conference, TSD 2016, Masaryk University, Springer International Publishing, Cham/Heidelberg/New York/Dordrecht/London, no. 9924 in Lecture Notes in Artificial Intelligence, pp 231–238
– reference: Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning (ICML), pp 1096–1103
– reference: KoponenMSalmiLNikulinMA product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation outputMach Transl2019331–2619010.1007/s10590-019-09228-7
– reference: Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Philadelphia, pp 311–318, https://doi.org/10.3115/1073083.1073135, http://portal.acm.org/citation.cfm?doid=1073083.1073135
– reference: HammarströmHBorinLUnsupervised learning of morphologyComput Linguist2011372309350184150810.1162/COLI_a_00050
– reference: Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), pp 1568–1575
– reference: Vaibhav V, Singh S, Stewart C, Neubig G (2019) Improving robustness of machine translation with synthetic noise. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 1916–1920, https://www.aclweb.org/anthology/N19-1190/
– reference: Lee YS (2004) Morphological analysis for statistical machine translation. In: Proceedings of HLT-NAACL 2004: short papers, Association for Computational Linguistics, pp 57–60
– reference: Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ArXiv:1910.13461 [cs.CL], arXiv:1910.13461
– reference: Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y (2014) An empirical investigation of catastrophic forgetting in gradient-based neural networks. In: Proceedings of international conference on learning representations (ICLR), Citeseer, https://arxiv.org/abs/1312.6211
– reference: Chen Z, Badrinarayanan V, Lee CY, Rabinovich A (2018) Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: Proceedings of the international conference on machine learning (ICML), pp 794–803
– reference: Conneau A, Lample G (2019) Cross-lingual language model pretraining. In: Proceedings of advances in neural information processing systems (NIPS), pp 7059–7069, http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining
– reference: Lample G, Conneau A, Denoyer L, Ranzato M (2018a) Unsupervised machine translation using monolingual corpora only. In: International conference on learning representations (ICLR), http://arxiv.org/abs/1711.00043
– reference: Zhang J, Zong C (2016) Exploiting source-side monolingual data in neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), pp 1535–1545
– reference: Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL) (Volume 1: Long Papers), pp 1693–1703, http://arxiv.org/abs/1603.06147
– reference: Thompson B, Khayrallah H, Anastasopoulos A, McCarthy AD, Duh K, Marvin R, McNamee P, Gwinnup J, Anderson T, Koehn P (2018) Freezing subnetworks to analyze domain adaptation in neural machine translation. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 124–132
– reference: Kudo T, Richardson J (2018) SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, association for computational linguistics, Brussels, Belgium, pp 66–71, https://doi.org/10.18653/v1/D18-2012, https://www.aclweb.org/anthology/D18-2012
– reference: Tu Z, Liu Y, Shang L, Liu X, Li H (2017) Neural machine translation with reconstruction. In: Thirty-first AAAI conference on artificial intelligence, http://arxiv.org/abs/1611.01874
– reference: Arivazhagan N, Bapna A, Firat O, Lepikhin D, Johnson M, Krikun M, Chen MX, Cao Y, Foster G, Cherry C, Macherey W, Chen Z, Wu Y (2019) Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv:1907.05019 [cs.CL]
– ident: 9253_CR44
  doi: 10.3115/v1/P15-1162
– ident: 9253_CR76
  doi: 10.1007/s10590-019-09243-8
– ident: 9253_CR10
– ident: 9253_CR2
– ident: 9253_CR33
– ident: 9253_CR85
  doi: 10.18653/v1/W18-6321
– ident: 9253_CR79
– ident: 9253_CR105
– volume: 37
  start-page: 309
  issue: 2
  year: 2011
  ident: 9253_CR41
  publication-title: Comput Linguist
  doi: 10.1162/COLI_a_00050
– ident: 9253_CR56
– ident: 9253_CR50
  doi: 10.18653/v1/P17-4012
– ident: 9253_CR88
  doi: 10.1109/CVPR.2016.308
– volume: 27
  start-page: 153
  issue: 2
  year: 2001
  ident: 9253_CR34
  publication-title: Comput Linguist
  doi: 10.1162/089120101750300490
– ident: 9253_CR95
  doi: 10.1109/ASRU.2013.6707697
– ident: 9253_CR27
– volume: 59
  start-page: 291
  issue: 4–5
  year: 1988
  ident: 9253_CR7
  publication-title: Biol Cybern
  doi: 10.1007/BF00332918
– ident: 9253_CR65
– ident: 9253_CR82
– ident: 9253_CR11
  doi: 10.18653/v1/P16-1185
– ident: 9253_CR36
  doi: 10.18653/v1/W19-5205
– volume: 39
  start-page: 1
  issue: 1
  year: 1977
  ident: 9253_CR25
  publication-title: J R Stat Soc
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– ident: 9253_CR38
– ident: 9253_CR59
– volume: 97
  start-page: 337
  issue: 457
  year: 2002
  ident: 9253_CR77
  publication-title: J Am Stat Assoc
  doi: 10.1198/016214502753479464
– ident: 9253_CR70
  doi: 10.3115/1073083.1073135
– ident: 9253_CR9
  doi: 10.18653/v1/W19-5206
– ident: 9253_CR24
– ident: 9253_CR104
  doi: 10.18653/v1/D16-1163
– ident: 9253_CR51
  doi: 10.18653/v1/W18-6325
– ident: 9253_CR21
  doi: 10.18653/v1/W17-4715
– volume: 5
  start-page: 79
  year: 2005
  ident: 9253_CR53
  publication-title: MT Summit
– ident: 9253_CR67
  doi: 10.1016/S0079-7421(08)60536-8
– ident: 9253_CR64
– ident: 9253_CR4
– ident: 9253_CR101
  doi: 10.18653/v1/D18-1100
– ident: 9253_CR69
  doi: 10.3115/1626355.1626359
– ident: 9253_CR89
  doi: 10.18653/v1/W18-6313
– ident: 9253_CR14
  doi: 10.18653/v1/P16-1160
– volume: 15
  start-page: 1929
  issue: 1
  year: 2014
  ident: 9253_CR84
  publication-title: J Mach Learn Res
– ident: 9253_CR86
– ident: 9253_CR103
  doi: 10.18653/v1/D16-1160
– ident: 9253_CR1
– ident: 9253_CR92
– ident: 9253_CR19
– ident: 9253_CR30
  doi: 10.18653/v1/N16-1101
– volume: 31
  start-page: 190
  issue: 2
  year: 1955
  ident: 9253_CR42
  publication-title: Language
  doi: 10.2307/411036
– ident: 9253_CR52
  doi: 10.18653/v1/W18-6325
– ident: 9253_CR97
  doi: 10.1145/1390156.1390294
– ident: 9253_CR47
– ident: 9253_CR75
– ident: 9253_CR29
  doi: 10.18653/v1/D18-1045
– ident: 9253_CR22
– volume-title: Stochastic complexity in statistical inquiry
  year: 1989
  ident: 9253_CR74
– volume: 33
  start-page: 61
  issue: 1–2
  year: 2019
  ident: 9253_CR55
  publication-title: Mach Transl
  doi: 10.1007/s10590-019-09228-7
– ident: 9253_CR61
  doi: 10.18653/v1/D18-1549
– ident: 9253_CR81
– ident: 9253_CR60
  doi: 10.18653/v1/D18-1549
– ident: 9253_CR94
  doi: 10.18653/v1/N19-1190
– ident: 9253_CR73
  doi: 10.18653/v1/D17-1039
– ident: 9253_CR98
– ident: 9253_CR57
  doi: 10.18653/v1/P18-1007
– ident: 9253_CR23
– ident: 9253_CR39
  doi: 10.18653/v1/D18-1398
– ident: 9253_CR71
  doi: 10.18653/v1/D18-1039
– volume: 25
  start-page: 127
  issue: 2
  year: 2011
  ident: 9253_CR31
  publication-title: Mach Transl
  doi: 10.1007/s10590-011-9090-0
– ident: 9253_CR63
– ident: 9253_CR3
– start-page: 95
  volume-title: Learning to learn
  year: 1998
  ident: 9253_CR8
  doi: 10.1007/978-1-4615-5529-2_5
– volume: 5
  start-page: 339
  year: 2017
  ident: 9253_CR45
  publication-title: Trans Assoc Comput Linguist
  doi: 10.1162/tacl_a_00065
– ident: 9253_CR80
  doi: 10.18653/v1/P16-1009
– ident: 9253_CR58
  doi: 10.18653/v1/D18-2012
– ident: 9253_CR90
– ident: 9253_CR13
  doi: 10.18653/v1/P17-2061
– volume: 4
  start-page: 405
  issue: 1
  year: 2007
  ident: 9253_CR20
  publication-title: ACM Trans Speech Lang Process
  doi: 10.1145/1187415.1187418
– volume: 6
  start-page: 225
  year: 2018
  ident: 9253_CR49
  publication-title: Trans Assoc Comput Linguist
  doi: 10.1162/tacl_a_00017
– ident: 9253_CR102
  doi: 10.18653/v1/P18-1005
– ident: 9253_CR18
  doi: 10.3115/1118647.1118650
– volume: 32
  start-page: 167
  issue: 1–2
  year: 2018
  ident: 9253_CR48
  publication-title: Mach Transl
  doi: 10.1007/s10590-017-9203-5
– ident: 9253_CR66
– ident: 9253_CR83
– ident: 9253_CR87
– volume: 12
  start-page: 23
  issue: 2
  year: 1994
  ident: 9253_CR32
  publication-title: C Users J
– ident: 9253_CR35
– ident: 9253_CR6
  doi: 10.18653/v1/W18-6401
– ident: 9253_CR54
– ident: 9253_CR91
  doi: 10.18653/v1/E17-1100
– ident: 9253_CR12
  doi: 10.18653/v1/D18-1461
– ident: 9253_CR62
  doi: 10.3115/1613984.1613999
– ident: 9253_CR72
  doi: 10.18653/v1/W15-3049
– ident: 9253_CR15
– ident: 9253_CR37
  doi: 10.18653/v1/W18-6410
– ident: 9253_CR40
– ident: 9253_CR16
  doi: 10.18653/v1/P16-2058
– ident: 9253_CR93
  doi: 10.1609/aaai.v31i1.10950
– volume: 8
  start-page: 64
  year: 2020
  ident: 9253_CR46
  publication-title: Trans Assoc Comput Linguist
  doi: 10.1162/tacl_a_00300
– ident: 9253_CR96
– ident: 9253_CR99
– ident: 9253_CR100
– ident: 9253_CR78
  doi: 10.18653/v1/P19-1021
– ident: 9253_CR28
  doi: 10.18653/v1/D17-1158
– ident: 9253_CR26
– ident: 9253_CR43
– ident: 9253_CR5
– ident: 9253_CR68
– ident: 9253_CR17
  doi: 10.18653/v1/W17-4123
SSID ssj0010003
Score 2.1980243
Snippet There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 251
SubjectTerms Artificial Intelligence
Asymmetry
Computational Linguistics
Computer Science
Czech language
Danish language
Denoising
English language
Estonian language
Experiments
Finnish language
Languages
Learning
Learning transfer
Machine translation
Monolingualism
Natural Language Processing (NLP)
Noise reduction
Norwegian language
Parallel corpora
Pretraining
Regularization
Sami languages
Sampling
Segmentation
Slovak language
Swedish language
Translation
Translation methods and strategies
Vocabulary
Title Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
URI https://link.springer.com/article/10.1007/s10590-020-09253-x
https://www.proquest.com/docview/2493492904
Volume 34
WOSCitedRecordID wos000613039800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-0573
  dateEnd: 20211231
  omitProxy: false
  ssIdentifier: ssj0010003
  issn: 0922-6567
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELUQcOBCoWyFgnxAXMCSsziJjwhRcagqxFL1giLbSRASTVGTsvw9M6nTAgIkuCa2k3ib58zMe4QcOkZpGRjNQikE802mmNQO8t5qTwBecNJK67DfDXu9aDCQlzYprKij3WuXZLVTf0h2E5IzPO5w6QqPAXJcAnMX4XK8uu7PfAcI8yuGPThmAVoJbarM9218NkdzjPnFLVpZm07jf--5RlYtuqSn0-mwThbSvEkatXIDtQu5Sba79jdlQY9od8asXGyQu8p4ZVDY6kncU5UntJjoFzim0kJhBDpcBKxLVfE2HKIil2Fj6wWgozxl5YgNYY-hyJUJb1Nii9OQu01y2zm_ObtgVoKBGS_wSpbCVm4CV8lMGqmVCl1ks0mQ9YwrPxFuqnTGpdYiDU0iIifMUOOPZ65RiWd8b4ss5vDkHUJDACsSqqCojC9VIt0owyQSboKQJ5K3iFOPRGwsPznKZDzGc2Zl7NkYejauejZ-bZHjWZ2nKTvHr6Xb9QDHdqUWMXwQEjRK7rfIST2g89s_t7b7t-J7ZMXFOVFFwrTJYjmepPtk2TyXD8X4oJrB752D67w
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwEB0hQIIL-1Io4APiApacramPCFGBCBViExcU2U6CkGiLmrD9PTOp0wICJLgmtpN4m-fMzHsA245RWjaM5qEMAu6bTHGpHeK91V6AeMFJS63D6yhst5s3N_LMJoXlVbR75ZIsd-oPyW6BFJyOO0K6gccROU74aLEokO_84nroOyCYXzLs4TEL0UpoU2W-b-OzORphzC9u0dLatGb_955zMGPRJdsfTId5GEu7CzBbKTcwu5AXYCWyvylztsOiIbNyvgi3pfHKsLDVk7hjqpuw_Em_4DGV5Yoi0PEiYl2m8rdOhxS5DO9bLwDrdVNe9HgH9xhGXJn4NgW1OAi5W4Kr1uHlwRG3EgzceA2v4Clu5abhKplJI7VSoUtsNgmxngnlJ4GbKp0JqXWQhiYJmk6YkcafyFyjEs_43jKMd_HJq8BCBCsSq5CojC9VIt1mRkkkwjRCkUhRA6caidhYfnKSyXiIR8zK1LMx9mxc9mz8WoPdYZ3HATvHr6Xr1QDHdqXmMX4QETRK4ddgrxrQ0e2fW1v7W_EtmDq6PI3i6Lh9sg7TLs2PMiqmDuNF_yndgEnzXNzn_c1yNr8DsoHuoA
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8QwFH7IKOLFcXdccxAvGibdJ0dRB8UyCC54kZKkrQhOlWnH5d-b16YzKiqI1zZJ26zf63vv-wB2LCUk95WkAfc86qpUUC4t5L2VjqfxgpWUWofXYdDrdW5u-PmHLP4y2r12SVY5DcjSlBXtpzhtf0h88zijaPowbnsO1Shy0kXRILTXL65HfgSE_CXbnja5NHIJTNrM9218PprGePOLi7Q8ebrN_7_zHMwa1EkOqmkyDxNJtgDNWtGBmAW-ACuh-X2Zk10SjhiX80W4LQ-1VBc2OhN3RGQxyYfyRZuvJBcYma4vagxMRP7W76NSl6ID4x0gj1lCi0fa13sPQQ5N_TYFtliF4i3BVff48vCEGmkGqhzfKWiit3jl24KnXHEpRGAjy02MbGhMuLFnJ0KmjEvpJYGKvY4VpKj9x1JbidhRrrMMjUw_eRVIoEEM11VQbMblIuZ2J8XkEqb8gMWctcCqRyVShrcc5TMeojHjMvZspHs2Kns2em3B3qjOU8Xa8WvpjXqwI7OC80h_EBI3cua2YL8e3PHtn1tb-1vxbZg-P-pG4WnvbB1mbJweZbDMBjSKwTDZhCn1XNzng61yYr8Dd_r3hA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Transfer+learning+and+subword+sampling+for+asymmetric-resource+one-to-many+neural+translation&rft.jtitle=Machine+translation&rft.au=Stig-Arne%2C+Gr%C3%B6nroos&rft.au=Virpioja+Sami&rft.au=Kurimo+Mikko&rft.date=2020-12-01&rft.pub=Springer+Nature+B.V&rft.issn=0922-6567&rft.eissn=1573-0573&rft.volume=34&rft.issue=4&rft.spage=251&rft.epage=286&rft_id=info:doi/10.1007%2Fs10590-020-09253-x&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0922-6567&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0922-6567&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0922-6567&client=summon