Syntax- and semantic-based reordering in hierarchical phrase-based statistical machine translation

•A syntax-based reordering model (RM) for SMT system is proposed.•Our RM predicts the orientation between syntactic dependants of the source sentence.•We enrich the proposed RM with semantic features, so it can perform semantic generalization.•Our RM outperforms the baseline and two competing RMs in...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 84; s. 186 - 199
Hlavní autoři: Kazemi, Arefeh, Toral, Antonio, Way, Andy, Monadjemi, Amirhassan, Nematbakhsh, Mohammadali
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Elsevier Ltd 30.10.2017
Elsevier BV
Témata:
ISSN:0957-4174, 1873-6793
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•A syntax-based reordering model (RM) for SMT system is proposed.•Our RM predicts the orientation between syntactic dependants of the source sentence.•We enrich the proposed RM with semantic features, so it can perform semantic generalization.•Our RM outperforms the baseline and two competing RMs in terms of BLEU and TER. We present a syntax-based reordering model (RM) for hierarchical phrase-based statistical machine translation (HPB-SMT) enriched with semantic features. Our model brings a number of novel contributions: (i) while the previous dependency-based RM is limited to the reordering of head and dependant constituent pairs, we also model the reordering of pairs of dependants; (ii) Our model is enriched with semantic features (Wordnet synsets) in order to allow the reordering model to generalize to pairs not seen in training but with equivalent meaning. (iii) We evaluate our model on two language directions: English-to-Farsi and English-to-Turkish. These language pairs are particularly challenging due to the free word order, rich morphology and lack of resources of the target languages. We evaluate our RM both intrinsically (accuracy of the RM classifier) and extrinsically (MT). Our best configuration outperforms the baseline classifier by 5–29% on pairs of dependants and by 12–30% on head and dependant pairs while the improvement on MT ranges between 1.6% and 5.5% relative in terms of BLEU depending on language pair and domain. We also analyze the value of the feature weights to obtain further insights on the impact of the reordering-related features in the HPB-SMT model. We observe that the features of our RM are assigned significant weights and that our features are complementary to the reordering feature included by default in the HPB-SMT model.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2017.05.001