A neural generative autoencoder for bilingual word embeddings

Bilingual word embeddings (BWEs) have been shown to be useful in various cross-lingual natural language processing tasks. To accurately learn BWEs, previous studies often resort to discriminative approaches which explore semantic proximities between translation equivalents of different languages. In...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information sciences Ročník 424; s. 287 - 300
Hlavní autoři: Su, Jinsong, Wu, Shan, Zhang, Biao, Wu, Changxing, Qin, Yue, Xiong, Deyi
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.01.2018
Témata:
ISSN:0020-0255, 1872-6291
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Bilingual word embeddings (BWEs) have been shown to be useful in various cross-lingual natural language processing tasks. To accurately learn BWEs, previous studies often resort to discriminative approaches which explore semantic proximities between translation equivalents of different languages. Instead, in this paper, we propose a neural generative bilingual autoencoder (NGBAE) which introduces a latent variable to explicitly induce the underlying semantics of bilingual text. In this way, NGBAE is able to obtain better BWEs from more robust bilingual semantics by modeling the semantic distributions of bilingual text. In order to facilitate scalable inference and learning, we utilize deep neural networks to perform the recognition and generation procedures, and then employ stochastic gradient variational Bayes algorithm to optimize them jointly. We validate the proposed model via both extrinsic (cross-lingual document classification and translation probability modeling) and intrinsic (word embedding analysis) evaluations. Experimental results demonstrate the effectiveness of NGBAE on learning BWEs.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2017.09.070