Syntax-based reordering for statistical machine translation
► A syntax-driven approach to handling the problem of word ordering for statistical machine translation. ► The word order challenge is alleviated including morpho-syntactical and statistical information in the context of a pre-translation reordering framework. ► The results are presented for small a...
Uloženo v:
| Vydáno v: | Computer speech & language Ročník 25; číslo 4; s. 761 - 788 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article Publikace |
| Jazyk: | angličtina |
| Vydáno: |
Kidlington
Elsevier Ltd
01.10.2011
Elsevier |
| Témata: | |
| ISSN: | 0885-2308, 1095-8363 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | ► A syntax-driven approach to handling the problem of word ordering for statistical machine translation. ► The word order challenge is alleviated including morpho-syntactical and statistical information in the context of a pre-translation reordering framework. ► The results are presented for small and large Chinese-to-English and Arabic-to-English translation tasks. ► The experiments are carried out on phrase-based and
N-gram-based statistical machine translation systems.
In this paper, we develop an approach called syntax-based reordering (SBR) to handling the fundamental problem of word ordering for statistical machine translation (SMT). We propose to alleviate the word order challenge including morpho-syntactical and statistical information in the context of a pre-translation reordering framework aimed at capturing short- and long-distance word distortion dependencies. We examine the proposed approach from the theoretical and experimental points of view discussing and analyzing its advantages and limitations in comparison with some of the state-of-the-art reordering methods.
In the final part of the paper, we describe the results of applying the syntax-based model to translation tasks with a great need for reordering (Chinese-to-English and Arabic-to-English). The experiments are carried out on standard phrase-based and alternative
N-gram-based SMT systems. We first investigate sparse training data scenarios, in which the translation and reordering models are trained on a sparse bilingual data, then scaling the method to a large training set and demonstrating that the improvement in terms of translation quality is maintained. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1 |
| ISSN: | 0885-2308 1095-8363 |
| DOI: | 10.1016/j.csl.2011.01.001 |