Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models
Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of m...
Saved in:
| Published in: | Transactions of the Association for Computational Linguistics Vol. 2; pp. 245 - 258 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.03.2021
MIT Press Journals, The The MIT Press |
| Subjects: | |
| ISSN: | 2307-387X, 2307-387X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Syntax-based distributional models of lexical semantics provide a flexible and
linguistically adequate representation of co-occurrence information. However,
their construction requires large, accurately parsed corpora, which are
unavailable for most languages.
In this paper, we develop a number of methods to overcome this obstacle. We
describe (a) a
approach that
constructs a syntax-based model for a new language requiring only an English
resource and a translation lexicon; and (b)
approaches that combine crosslingual with monolingual
information, subject to availability. We evaluate on two lexical semantic
benchmarks in German and Croatian. We find that the models exhibit complementary
profiles: crosslingual models yield higher accuracies while monolingual models
provide better coverage. In addition, we show that simple multilingual models
can successfully combine their strengths. |
|---|---|
| AbstractList | Syntax-based distributional models of lexical semantics provide a flexible and
linguistically adequate representation of co-occurrence information. However,
their construction requires large, accurately parsed corpora, which are
unavailable for most languages.
In this paper, we develop a number of methods to overcome this obstacle. We
describe (a) a
approach that
constructs a syntax-based model for a new language requiring only an English
resource and a translation lexicon; and (b)
approaches that combine crosslingual with monolingual
information, subject to availability. We evaluate on two lexical semantic
benchmarks in German and Croatian. We find that the models exhibit complementary
profiles: crosslingual models yield higher accuracies while monolingual models
provide better coverage. In addition, we show that simple multilingual models
can successfully combine their strengths. Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages.In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a crosslingual approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) multilingual approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths. Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a crosslingual approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) multilingual approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths. |
| Author | Padó, Sebastian Utt, Jason |
| Author_xml | – sequence: 1 givenname: Jason surname: Utt fullname: Utt, Jason email: uttjn@ims.uni-stuttgart.de organization: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, uttjn@ims.uni-stuttgart.de – sequence: 2 givenname: Sebastian surname: Padó fullname: Padó, Sebastian organization: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, pado@ims.uni-stuttgart.de |
| BookMark | eNptkT1PwzAQhi1UJErpxg-IxMJAwLHjj4wQ8VGpwFBAbJZjX6pUaVzsRKL_npQA6sB0p9Oj5073HqNR4xpA6DTBl0nCyVWrTa20wjiR-ACNCcUiplK8j_b6IzQNYYV3TE9xMkZPuXch1FWz7HQd6cZGj13dVr-D3DWh9Z1pK9dErowW26bVn_GNDmCjNzCt89Fiow1Ej85CHU7QYanrANOfOkGvd7cv-UM8f76f5dfz2BBMcCyz1DIDxBrAnIJMpUkp5oVItJCpYdZIwwgpysyCLoVhUgKwohTcFtSwhE7QbPBap1dq46u19lvldKW-B84vlfZtZWpQhDKZcGYyDixNIdOCpxKooJwUTNu0d50Nro13Hx2EVq1c55v-fEVkRjPCMBY9dTFQZvcwD-Xf1gSrXQBqP4AePx_wdbXn-xf9AlX9iAk |
| Cites_doi | 10.1080/00437956.1954.11659520 10.1162/coli.2006.32.3.379 10.1613/jair.2820 10.1162/coli.2006.32.2.159 10.1017/S1351324905003840 10.1145/2050100.2050102 10.1037/0033-295X.104.2.211 10.1080/01690969108406936 10.1162/089120102760275983 10.1162/coli_a_00017 10.1007/s10579-009-9081-4 |
| ContentType | Journal Article |
| Copyright | 2014. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2014. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION 7T9 8FE 8FG ABUWG AFKRA ALSLI ARAPS AZQEC BENPR BGLVJ CCPQU CPGLG CRLPW DWQXO GNUQQ HCIFZ JQ2 K7- P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PRQQA DOA |
| DOI | 10.1162/tacl_a_00180 |
| DatabaseName | CrossRef Linguistics and Language Behavior Abstracts (LLBA) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Social Science Premium Collection Advanced Technologies & Computer Science Collection ProQuest Central Essentials AUTh Library subscriptions: ProQuest Central ProQuest Technology Collection ProQuest One Community College Linguistics Collection Linguistics Database ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Proquest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China One Social Sciences DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef Publicly Available Content Database Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Linguistics Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies & Aerospace Collection Social Science Premium Collection ProQuest One Social Sciences ProQuest One Academic Eastern Edition Linguistics and Language Behavior Abstracts (LLBA) ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Linguistics Database ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Publicly Available Content Database CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2307-387X |
| EndPage | 258 |
| ExternalDocumentID | oai_doaj_org_article_2358165c96e544e9a7648e37362b5ad4 10_1162_tacl_a_00180 tacl_a_00180.pdf |
| GroupedDBID | AAFWJ AFPKN ALMA_UNASSIGNED_HOLDINGS EBS EJD GROUPED_DOAJ JMNJE M~E OJV OK1 RMI AAYXX ABUWG AFFHD AFKRA ALSLI ARAPS BENPR BGLVJ CCPQU CITATION CPGLG CRLPW DWQXO HCIFZ K7- PHGZM PHGZT PIMPY PQGLB PRQQA 7T9 8FE 8FG AZQEC GNUQQ JQ2 P62 PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c2020-894d5ce2dce063e848c4306b71a784c5dc8c522bf9deaf7c588ee5bf76db3c513 |
| IEDL.DBID | P5Z |
| ISSN | 2307-387X |
| IngestDate | Fri Oct 03 12:53:28 EDT 2025 Mon Nov 10 02:42:21 EST 2025 Sat Nov 29 05:34:28 EST 2025 Wed Oct 18 06:07:17 EDT 2023 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2020-894d5ce2dce063e848c4306b71a784c5dc8c522bf9deaf7c588ee5bf76db3c513 |
| Notes | Volume, 2014 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://www.proquest.com/docview/2893925007?pq-origsite=%requestingapplication% |
| PQID | 2893925007 |
| PQPubID | 6535866 |
| PageCount | 14 |
| ParticipantIDs | mit_journals_10_1162_tacl_a_00180 crossref_primary_10_1162_tacl_a_00180 doaj_primary_oai_doaj_org_article_2358165c96e544e9a7648e37362b5ad4 proquest_journals_2893925007 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-03-01 |
| PublicationDateYYYYMMDD | 2021-03-01 |
| PublicationDate_xml | – month: 03 year: 2021 text: 2021-03-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | One Rogers Street, Cambridge, MA 02142-1209, USA |
| PublicationPlace_xml | – name: One Rogers Street, Cambridge, MA 02142-1209, USA – name: Cambridge |
| PublicationTitle | Transactions of the Association for Computational Linguistics |
| PublicationYear | 2021 |
| Publisher | MIT Press MIT Press Journals, The The MIT Press |
| Publisher_xml | – name: MIT Press – name: MIT Press Journals, The – name: The MIT Press |
| References | p_39 Miller George A. (p_28) 1991; 6 Harris Zelig S. (p_15) 1954; 10 p_45 p_13 Hwa Rebecca (p_17) 2005; 11 Landauer Thomas K (p_22) 1997; 104 Baroni Marco (p_3) 2008; 43 Naseem Tahira (p_30) 2009; 36 Joanis Eric (p_18) 2006; 14 Peirsman Yves (p_35) 2011; 8 Baroni Marco (p_2) 2010; 36 p_11 |
| References_xml | – volume: 10 start-page: 146 issue: 23 year: 1954 ident: p_15 publication-title: Word doi: 10.1080/00437956.1954.11659520 – ident: p_45 doi: 10.1162/coli.2006.32.3.379 – volume: 36 start-page: 1 year: 2009 ident: p_30 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.2820 – volume: 14 start-page: 337 issue: 03 year: 2006 ident: p_18 publication-title: Natural Language Engineering – ident: p_39 doi: 10.1162/coli.2006.32.2.159 – volume: 11 start-page: 311 issue: 3 year: 2005 ident: p_17 publication-title: Journal of Natural Language Engineering doi: 10.1017/S1351324905003840 – volume: 8 start-page: 1 issue: 2 year: 2011 ident: p_35 publication-title: ACM Transactions in Speech and Language Processing doi: 10.1145/2050100.2050102 – volume: 36 start-page: 1 issue: 4 year: 2010 ident: p_2 publication-title: Computational Linguistics – volume: 104 start-page: 211 issue: 2 year: 1997 ident: p_22 publication-title: Psychological Review doi: 10.1037/0033-295X.104.2.211 – volume: 6 start-page: 1 issue: 1 year: 1991 ident: p_28 publication-title: Language and Cognitive Processes doi: 10.1080/01690969108406936 – ident: p_13 doi: 10.1162/089120102760275983 – ident: p_11 doi: 10.1162/coli_a_00017 – volume: 43 start-page: 209 issue: 3 year: 2008 ident: p_3 publication-title: Language Resources and Evaluation doi: 10.1007/s10579-009-9081-4 |
| SSID | ssj0001818062 |
| Score | 2.1355512 |
| Snippet | Syntax-based distributional models of lexical semantics provide a flexible and
linguistically adequate representation of co-occurrence information. However,... Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However,... |
| SourceID | doaj proquest crossref mit |
| SourceType | Open Website Aggregation Database Index Database Publisher |
| StartPage | 245 |
| SubjectTerms | Benchmarks Comorbidity Computational linguistics English language German language Languages Lexical semantics Model accuracy Monolingualism Semantics Serbo-Croatian language Syntactic structures Syntax Vector spaces |
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA4yPHgRRcXplAh6LNvS_OrRDYenIUxlt5K8pDCYnbgq-t_7krY6EPHiNQ1teC_N-77w3vcIuRzIjAHCgETLFBIeymQMF2mihTeeI6C1tojNJtR0qufz7G6j1VfICavlgWvD9UMp51AKyKQXnPvMKMm1TxUevFYYF5VAByrbIFPxdiWUMEvWZrpL1q8MLHMTUriCAuRGDIpS_RhZnhbVj_M4BpnJHtlt0CG9rle1T7Z8eUCm4_CWUDb-is-Q-dNYNdsOhJabrQgsXRV09lFW5j0ZYXhy9DFeytMZMmNPQ9-z5fqQPExu7se3SdMGIQEWyJ3OuBPgmQOPeMJrroEj0LdqaJTmIBxoQBRli8x5UygQWnsvbKGksymIYXpEOuWq9MeEIr4RLlVacgBe8MIguADBHXNSGm91l1y1hsmfa7WLPLIEyfJNA3bJKFjta07QqI4D6Lm88Vz-l-e65AJtnjf_zPqXD_Vaj3xPRI6IqE4gujn5j3Wckh0WElViYlmPdNBl_oxsw1u1WL-cx331CdmX0K0 priority: 102 providerName: Directory of Open Access Journals |
| Title | Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models |
| URI | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00180 https://www.proquest.com/docview/2893925007 https://doaj.org/article/2358165c96e544e9a7648e37362b5ad4 |
| Volume | 2 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2307-387X dateEnd: 20241231 omitProxy: false ssIdentifier: ssj0001818062 issn: 2307-387X databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2307-387X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001818062 issn: 2307-387X databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 2307-387X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001818062 issn: 2307-387X databaseCode: P5Z dateStart: 20130101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 2307-387X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001818062 issn: 2307-387X databaseCode: K7- dateStart: 20130101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Linguistics Database customDbUrl: eissn: 2307-387X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001818062 issn: 2307-387X databaseCode: CRLPW dateStart: 20130101 isFulltext: true titleUrlDefault: https://search.proquest.com/linguistics providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2307-387X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001818062 issn: 2307-387X databaseCode: BENPR dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2307-387X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001818062 issn: 2307-387X databaseCode: PIMPY dateStart: 20130101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA66evDiAxXXx1JBj0W3zasncRdFEZfii9VLSSepCNpVdxX9987E1hVEL156SAINmSTfzGRmPsa2dmUSAaoBoZYxhJzSZAwXcaiFM46jQpvnhSebUL2e7veTtHK4DauwyvpO9Be1HQD5yHfQMEAoFwhpe49PIbFG0etqRaExyaaoSgJRN6TiZuxjoURmzylK4c5URrZfx77LaGdk4D4zFNRFNSG_oZIv3o9Y83A3-nFDe9g5nPvvhOfZbKVwBvufO2SBTbhykfW6NA3KRH_BPlPawCfi1g3E4lnXlQ0GRXD-Xo7MW9hBxLPBlffzB-dobLuAqNTuh0vs8vDgonsUVswKIURkL-qEWwEusuBQRXGaa-BoO-SqbZTmICxoQMUsLxLrTKFAaO2cyAslbR6DaMfLrFEOSrfCAlSZhI2VlhyAF7wwqK-A4DayUhqX6ybbrlc2e_wsoJF5w0NG2XcJNFmHlv1rDJW99g2D59usOkUZ5fW2pYBEOsG5S4ySXLtYIQrnwljeZJsotKw6hsNffrRey2s8cCys1b-719hMRFEtPgptnTVQGG6DTcPr6G743GJTnYNeetbylj1-T1TY8lsSe9Lj0_T6A2yo5zE |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VUgkuQFVQF0oxEj1GZR2_ckCIvtRqy6pSH9pbcMYOqlSyS3d59E_xG5nxbrqVqnLrgWtsJXJmPN839jwA3r03hUSiAZkzOWaK02S80nnmdPRREaGtqjo1m7D9vhsMiqMF-NPmwnBYZWsTk6EOQ-Qz8k1yDAjKNUHax9H3jLtG8e1q20Jjqha9ePWLXLbxh4Mdku-GlHu7J9v72ayrQIaSfSVXqKAxyoCR4Dk65VARb65s11unUAd0SKSkqosQfW1ROxejrmprQpWj7ub03gfwUOXO8r7q2Wx-psOJ06mHKYdXc9naQRtrb-TmxONF6TmIjGtQ3kDB1CyAsO3b-eQWIiSY23v6v_2gZ_BkRqjFp-kOWIaF2KxAf5uXzZn2P2jMN0GkROP2AXcpbevmimEtjq-aif-dbRGiB3GW7jHE8chjFNwq7mL8HE7vZQkvYLEZNnEVBFFCHXLrjEJUtao98THUKshgjI-V68BGK8lyNC0QUibHysjypsQ7sMVivp7DZb3Tg-Hl13JmJUrOW-4ajYWJWqlYeGuUi7klllFpH1QH3pKSlDMzM77jQ2utfswnzpXj5b-H38Cj_ZPPh-XhQb_3Ch5LjuBJEXdrsEiCia9hCX9OzseX60n1BXy5b1X6C18rQBw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Crosslingual+and+Multilingual+Construction+of+Syntax-Based+Vector+Space+Models&rft.jtitle=Transactions+of+the+Association+for+Computational+Linguistics&rft.au=Utt%2C+Jason&rft.au=Pad%C3%B3%2C+Sebastian&rft.date=2021-03-01&rft.pub=MIT+Press+Journals%2C+The&rft.issn=2307-387X&rft.eissn=2307-387X&rft.volume=2&rft.spage=245&rft_id=info:doi/10.1162%2Ftacl_a_00180 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2307-387X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2307-387X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2307-387X&client=summon |