Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models
Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of m...
Uloženo v:
| Vydáno v: | Transactions of the Association for Computational Linguistics Ročník 2; s. 245 - 258 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.03.2021
MIT Press Journals, The The MIT Press |
| Témata: | |
| ISSN: | 2307-387X, 2307-387X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Syntax-based distributional models of lexical semantics provide a flexible and
linguistically adequate representation of co-occurrence information. However,
their construction requires large, accurately parsed corpora, which are
unavailable for most languages.
In this paper, we develop a number of methods to overcome this obstacle. We
describe (a) a
approach that
constructs a syntax-based model for a new language requiring only an English
resource and a translation lexicon; and (b)
approaches that combine crosslingual with monolingual
information, subject to availability. We evaluate on two lexical semantic
benchmarks in German and Croatian. We find that the models exhibit complementary
profiles: crosslingual models yield higher accuracies while monolingual models
provide better coverage. In addition, we show that simple multilingual models
can successfully combine their strengths. |
|---|---|
| Bibliografie: | Volume, 2014 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2307-387X 2307-387X |
| DOI: | 10.1162/tacl_a_00180 |