Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models

Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of m...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Transactions of the Association for Computational Linguistics Ročník 2; s. 245 - 258
Hlavní autoři:	Utt, Jason, Padó, Sebastian
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.03.2021 MIT Press Journals, The The MIT Press
Témata:	Benchmarks Comorbidity Computational linguistics English language German language Languages Lexical semantics Model accuracy Monolingualism Semantics Serbo-Croatian language Syntactic structures Syntax Vector spaces
ISSN:	2307-387X, 2307-387X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths.
Bibliografie:	Volume, 2014 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2307-387X 2307-387X
DOI:	10.1162/tacl_a_00180