A Substitution-Translation-Restoration Framework for Handling Unknown Words in Statistical Machine Translation

Unknown words are one of the key factors that greatly affect the translation quality. Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words. However, these approaches have two disadvantages. On the one hand, they usually rely on many additional reso...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of computer science and technology Ročník 28; číslo 5; s. 907 - 918
Hlavní autoři:	Zhang, Jia-Jun, Zhai, Fei-Fei, Zong, Cheng-Qing
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Boston Springer US 01.09.2013 Springer Nature B.V National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Témata:	Artificial Intelligence Bilingualism Computer Science Computer simulation Data mining Data Structures and Information Theory Experiments Handling Information Systems Applications (incl.Internet) Language Machine translation Mathematical analysis Mathematical models Regular Paper Semantics Sentences SMT Software Engineering Studies Theory of Computation Translations Words (language) 框架统计机器翻译网络数据翻译质量翻译过程语义模型语言模型 Statistical machine translation Distributional semantics Bidirectional language model bidirectional language model distributional semantics statistical machine translation
ISSN:	1000-9000, 1860-4749
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Unknown words are one of the key factors that greatly affect the translation quality. Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words. However, these approaches have two disadvantages. On the one hand, they usually rely on many additional resources such as bilingual web data; on the other hand, they cannot guarantee good reordering and lexical selection of surrounding words. This paper gives a new perspective on handling unknown words in statistical machine translation （SMT）. Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process. In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated. In order to determine the semantic function of an unknown word, we employ the distributional semantic model and the bidirectional language model. Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality.
Bibliografie:	Unknown words are one of the key factors that greatly affect the translation quality. Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words. However, these approaches have two disadvantages. On the one hand, they usually rely on many additional resources such as bilingual web data; on the other hand, they cannot guarantee good reordering and lexical selection of surrounding words. This paper gives a new perspective on handling unknown words in statistical machine translation （SMT）. Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process. In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated. In order to determine the semantic function of an unknown word, we employ the distributional semantic model and the bidirectional language model. Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality. 11-2296/TP statistical machine translation, distributional semantics, bidirectional language model Jia-Jun Zhang , Member, CCF, FeimFei Zhai and Cheng-Qing Zong , Senior Member, CCF （National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China E-marl： {jjzhang, ffzhai, cqzong}@nlpr.ia.ac.cn Received December 4, 2012; revised May 7, 2013） ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-013-1386-5