Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models

Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of m...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Transactions of the Association for Computational Linguistics Ročník 2; s. 245 - 258
Hlavní autoři: Utt, Jason, Padó, Sebastian
Médium: Journal Article
Jazyk:angličtina
Vydáno: One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.03.2021
MIT Press Journals, The
The MIT Press
Témata:
ISSN:2307-387X, 2307-387X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths.
AbstractList Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths.
Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages.In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a crosslingual approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) multilingual approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths.
Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a crosslingual approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) multilingual approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths.
Author Padó, Sebastian
Utt, Jason
Author_xml – sequence: 1
  givenname: Jason
  surname: Utt
  fullname: Utt, Jason
  email: uttjn@ims.uni-stuttgart.de
  organization: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, uttjn@ims.uni-stuttgart.de
– sequence: 2
  givenname: Sebastian
  surname: Padó
  fullname: Padó, Sebastian
  organization: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, pado@ims.uni-stuttgart.de
BookMark eNptkT1PwzAQhi1UJErpxg-IxMJAwLHjj4wQ8VGpwFBAbJZjX6pUaVzsRKL_npQA6sB0p9Oj5073HqNR4xpA6DTBl0nCyVWrTa20wjiR-ACNCcUiplK8j_b6IzQNYYV3TE9xMkZPuXch1FWz7HQd6cZGj13dVr-D3DWh9Z1pK9dErowW26bVn_GNDmCjNzCt89Fiow1Ej85CHU7QYanrANOfOkGvd7cv-UM8f76f5dfz2BBMcCyz1DIDxBrAnIJMpUkp5oVItJCpYdZIwwgpysyCLoVhUgKwohTcFtSwhE7QbPBap1dq46u19lvldKW-B84vlfZtZWpQhDKZcGYyDixNIdOCpxKooJwUTNu0d50Nro13Hx2EVq1c55v-fEVkRjPCMBY9dTFQZvcwD-Xf1gSrXQBqP4AePx_wdbXn-xf9AlX9iAk
Cites_doi 10.1080/00437956.1954.11659520
10.1162/coli.2006.32.3.379
10.1613/jair.2820
10.1162/coli.2006.32.2.159
10.1017/S1351324905003840
10.1145/2050100.2050102
10.1037/0033-295X.104.2.211
10.1080/01690969108406936
10.1162/089120102760275983
10.1162/coli_a_00017
10.1007/s10579-009-9081-4
ContentType Journal Article
Copyright 2014. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2014. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7T9
8FE
8FG
ABUWG
AFKRA
ALSLI
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
CPGLG
CRLPW
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PRQQA
DOA
DOI 10.1162/tacl_a_00180
DatabaseName CrossRef
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Social Science Premium Collection
ProQuest SciTech Premium Collection Technology Collection Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection
ProQuest One
Linguistics Collection
Linguistics Database
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest One Social Sciences
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
Linguistics Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies & Aerospace Collection
Social Science Premium Collection
ProQuest One Social Sciences
ProQuest One Academic Eastern Edition
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Linguistics Database
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
Publicly Available Content Database
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: PIMPY
  name: ProQuest Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2307-387X
EndPage 258
ExternalDocumentID oai_doaj_org_article_2358165c96e544e9a7648e37362b5ad4
10_1162_tacl_a_00180
tacl_a_00180.pdf
GroupedDBID AAFWJ
AFPKN
ALMA_UNASSIGNED_HOLDINGS
EBS
EJD
GROUPED_DOAJ
JMNJE
M~E
OJV
OK1
RMI
AAYXX
ABUWG
AFFHD
AFKRA
ALSLI
ARAPS
BENPR
BGLVJ
CCPQU
CITATION
CPGLG
CRLPW
DWQXO
HCIFZ
K7-
PHGZM
PHGZT
PIMPY
PQGLB
PRQQA
7T9
8FE
8FG
AZQEC
GNUQQ
JQ2
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c2020-894d5ce2dce063e848c4306b71a784c5dc8c522bf9deaf7c588ee5bf76db3c513
IEDL.DBID DOA
ISSN 2307-387X
IngestDate Fri Oct 03 12:53:28 EDT 2025
Mon Nov 10 02:42:21 EST 2025
Sat Nov 29 05:34:28 EST 2025
Wed Oct 18 06:07:17 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2020-894d5ce2dce063e848c4306b71a784c5dc8c522bf9deaf7c588ee5bf76db3c513
Notes Volume, 2014
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://doaj.org/article/2358165c96e544e9a7648e37362b5ad4
PQID 2893925007
PQPubID 6535866
PageCount 14
ParticipantIDs mit_journals_10_1162_tacl_a_00180
crossref_primary_10_1162_tacl_a_00180
doaj_primary_oai_doaj_org_article_2358165c96e544e9a7648e37362b5ad4
proquest_journals_2893925007
PublicationCentury 2000
PublicationDate 2021-03-01
PublicationDateYYYYMMDD 2021-03-01
PublicationDate_xml – month: 03
  year: 2021
  text: 2021-03-01
  day: 01
PublicationDecade 2020
PublicationPlace One Rogers Street, Cambridge, MA 02142-1209, USA
PublicationPlace_xml – name: One Rogers Street, Cambridge, MA 02142-1209, USA
– name: Cambridge
PublicationTitle Transactions of the Association for Computational Linguistics
PublicationYear 2021
Publisher MIT Press
MIT Press Journals, The
The MIT Press
Publisher_xml – name: MIT Press
– name: MIT Press Journals, The
– name: The MIT Press
References p_39
Miller George A. (p_28) 1991; 6
Harris Zelig S. (p_15) 1954; 10
p_45
p_13
Hwa Rebecca (p_17) 2005; 11
Landauer Thomas K (p_22) 1997; 104
Baroni Marco (p_3) 2008; 43
Naseem Tahira (p_30) 2009; 36
Joanis Eric (p_18) 2006; 14
Peirsman Yves (p_35) 2011; 8
Baroni Marco (p_2) 2010; 36
p_11
References_xml – volume: 10
  start-page: 146
  issue: 23
  year: 1954
  ident: p_15
  publication-title: Word
  doi: 10.1080/00437956.1954.11659520
– ident: p_45
  doi: 10.1162/coli.2006.32.3.379
– volume: 36
  start-page: 1
  year: 2009
  ident: p_30
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.2820
– volume: 14
  start-page: 337
  issue: 03
  year: 2006
  ident: p_18
  publication-title: Natural Language Engineering
– ident: p_39
  doi: 10.1162/coli.2006.32.2.159
– volume: 11
  start-page: 311
  issue: 3
  year: 2005
  ident: p_17
  publication-title: Journal of Natural Language Engineering
  doi: 10.1017/S1351324905003840
– volume: 8
  start-page: 1
  issue: 2
  year: 2011
  ident: p_35
  publication-title: ACM Transactions in Speech and Language Processing
  doi: 10.1145/2050100.2050102
– volume: 36
  start-page: 1
  issue: 4
  year: 2010
  ident: p_2
  publication-title: Computational Linguistics
– volume: 104
  start-page: 211
  issue: 2
  year: 1997
  ident: p_22
  publication-title: Psychological Review
  doi: 10.1037/0033-295X.104.2.211
– volume: 6
  start-page: 1
  issue: 1
  year: 1991
  ident: p_28
  publication-title: Language and Cognitive Processes
  doi: 10.1080/01690969108406936
– ident: p_13
  doi: 10.1162/089120102760275983
– ident: p_11
  doi: 10.1162/coli_a_00017
– volume: 43
  start-page: 209
  issue: 3
  year: 2008
  ident: p_3
  publication-title: Language Resources and Evaluation
  doi: 10.1007/s10579-009-9081-4
SSID ssj0001818062
Score 2.1355512
Snippet Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However,...
Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However,...
SourceID doaj
proquest
crossref
mit
SourceType Open Website
Aggregation Database
Index Database
Publisher
StartPage 245
SubjectTerms Benchmarks
Comorbidity
Computational linguistics
English language
German language
Languages
Lexical semantics
Model accuracy
Monolingualism
Semantics
Serbo-Croatian language
Syntactic structures
Syntax
Vector spaces
SummonAdditionalLinks – databaseName: Advanced Technologies & Aerospace Database
  dbid: P5Z
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA6-Dl58oGJ9sYIeF9vdJJucRMXiQUrBB8XLkkyyItRtbavov3cm3bWC6MXrJpCQSea1M9_H2JHzvqmNEXGGzmzMoeliJa2NnfTGqCJViQno-tdZp6N6Pd2tEm7jqqyy1olBUbsBUI78BAMDNOUCTdrp8CUm1ij6u1pRaMyzRUJJIOqGrniY5ViokTlwilK5M8HI9urad5mcTAz0c0NFXYQJ-c0qBfB-tDXPT5MfGjqYnfbqfze8xlYqhzM6m96QdTbnyw3WuaBtUCf6K46Z0kWhEbf-QCyeNa5sNCiim49yYt7jc7R4LroPef7oBoNtHxGVWn-8ye7al7cXV3HFrBBDQvGi0twJ8IkDjy6KV1wBx9jBZi2TKQ7CgQJ0zGyhnTdFBkIp74UtMulsCqKVbrGFclD6bRY5IcBo9EJSbXmWSq1sU6OKBQNFkTR5gx3XJ5sPpwAaeQg8ZJJ_l0CDndOxf80h2OvwYTB6zKtXlFNfb0sK0NILzr02meTKp7hsYoVxuNghCi2vnuH4l4X2annNJs6EtfP38C5bTqiqJVSh7bEFFIbfZ0vwNnkajw7C9fsEcvvi6A
  priority: 102
  providerName: ProQuest
Title Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models
URI https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00180
https://www.proquest.com/docview/2893925007
https://doaj.org/article/2358165c96e544e9a7648e37362b5ad4
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2307-387X
  dateEnd: 20241231
  omitProxy: false
  ssIdentifier: ssj0001818062
  issn: 2307-387X
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2307-387X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001818062
  issn: 2307-387X
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 2307-387X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001818062
  issn: 2307-387X
  databaseCode: P5Z
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 2307-387X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001818062
  issn: 2307-387X
  databaseCode: K7-
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Linguistics Database
  customDbUrl:
  eissn: 2307-387X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001818062
  issn: 2307-387X
  databaseCode: CRLPW
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/linguistics
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 2307-387X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001818062
  issn: 2307-387X
  databaseCode: BENPR
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Publicly Available Content Database
  customDbUrl:
  eissn: 2307-387X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001818062
  issn: 2307-387X
  databaseCode: PIMPY
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF5EPXgRRcVqLRH0GJom-zza0qKoIViV6iXsK1Coqdgq-u-d3TYaEPHiZQ-7S7LMJDPfLDPfIHRirI2ElCRkAGZDrCMTcqpUaKiVkhcJj6Vn179iacpHI5HVWn25nLAFPfBCcG1XytmhRAtqCcZWSEYxtwkDw6uINJ4JNGKiFkz52xVXwkzjKtOdxu251JNcuhQuxwBZ80Geqh88y9N4_sMeeycz2EKbS3QYnC1OtY1WbLmD0p57iisbf4U1iPwDXzVbTbiWmxUJbDAtguFHOZfvYRfckwnu_aV8MITI2Aau79lktovuBv3b3nm4bIMQ6tgFd1xgQ7SNjbaAJyzHXGMA-op1JONYE6O5BhSlCmGsLJgmnFtLVMGoUYkmnWQPrZbT0u6jwBCipQDIkAiFQYSCq0iAPdRSF0Uc4QY6rQSTPy_YLnIfJdA4rwuwgbpOal97HEe1nwDN5UvN5X9proGOQeb58p-Z_fKiZqWR740QIwKqI4BuDv7jHIdoI3aJKj6xrIlWQWX2CK3rt_l49tJCa91-mt20_PcF4yULYczII6xkF9fZwyfPX9aR
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LbxMxELZKQYILDwGipYCR6NFq4vXzgBAtVK0SIqQWlNtij72oUtmkTXj0T_EbmXGypBKCWw9cbWtH3hnPN2PPg7EXKeeeD0ELi8asUNBLwpkYRTI5BNdUToZSXX9oRyM3Hvv3a-xnlwtDYZWdTiyKOk2A7sh30DFAKNcIaa-mZ4K6RtHratdCYyEWg3zxHV222cvDN8jfbSn33x7vHYhlVwEBknwl51XSkGWCjPCcnXKg0G6Oth-sU6ATOECjJDY-5dBY0M7lrGNjTYoV6H6F373GrqvKWTpXAytWdzqUOF16mFJ4NZWtHXex9kbuzAOc1oGCyKgG5SUULM0CENu-nMz_QIQCc_t3_rcfdJfdXhrU_PXiBNxja7m9z0Z7tG3KtP-Kc6FNvCQadwPUpbSrm8snDT-6aOfhh9hFRE_8Y3nH4EfTAJlTq7jT2QP24Uq28JCtt5M2P2I8aQ3Bo5VV-ahsZbyLPY8QAgGaRvbUBtvuOFlPFwVC6uJYGVlf5vgG2yU2_15DZb3LwOT8c73UEjXlLfeNBm-yVir7YI1yuUKyMuqQkNhzFJJ6qWZmfyG01cnHauFKODb_Pf2M3Tw4fjesh4ejwWN2S1IET4m422LryJj8hN2Ab_OT2fnTIvqcfbpqUfoFurVA5A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Crosslingual+and+Multilingual+Construction+of+Syntax-Based+Vector+Space+Models&rft.jtitle=Transactions+of+the+Association+for+Computational+Linguistics&rft.au=Utt%2C+Jason&rft.au=Pad%C3%B3%2C+Sebastian&rft.date=2021-03-01&rft.pub=MIT+Press&rft.eissn=2307-387X&rft.volume=2&rft.spage=245&rft.epage=258&rft_id=info:doi/10.1162%2Ftacl_a_00180&rft.externalDBID=n%2Fa&rft.externalDocID=tacl_a_00180.pdf
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2307-387X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2307-387X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2307-387X&client=summon