Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts
The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector represe...
Gespeichert in:
| Veröffentlicht in: | AMIA ... Annual Symposium proceedings Jg. 2012; S. 940 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
2012
|
| Schlagworte: | |
| ISSN: | 1942-597X, 1559-4076 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI. |
|---|---|
| AbstractList | The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI. The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI. |
| Author | Wahle, Manuel Cohen, Trevor Herskovic, Jorge R Bernstam, Elmer V Widdows, Dominic |
| Author_xml | – sequence: 1 givenname: Manuel surname: Wahle fullname: Wahle, Manuel organization: The University of Texas Health Science Center at Houston, School of Biomedical Informatics, USA – sequence: 2 givenname: Dominic surname: Widdows fullname: Widdows, Dominic – sequence: 3 givenname: Jorge R surname: Herskovic fullname: Herskovic, Jorge R – sequence: 4 givenname: Elmer V surname: Bernstam fullname: Bernstam, Elmer V – sequence: 5 givenname: Trevor surname: Cohen fullname: Cohen, Trevor |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/23304369$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1UEtLAzEYDFKxD_0LkqOXxTw2SfcobdVCqx4U9LRkky8S6WbXJCv6712wXmYGZhiYmaNJ6AKcoBkVoipKouRk1FXJClGp1ymap_RBSKnEUp6hKeOclFxWM_S2hgyx9cGn7A1ufNDxB3-ByV1M2HURg3PeeAgZ6yF3rc5gsQ8Wvn14x53D-816t33YXD8NzX60dJNy1Canc3Tq9CHBxZEX6OV287y6L3aPd9vVza7oKaWi4EtX0qqxRpZaWE0AtGksp4ZYwhxXvCmZIEvFHSitleRsBDKaWgI3lrIFuvrr7WP3OUDKdeuTgcNBB-iGVFOmOK-k5GKMXh6jQ9OCrfvo23Fu_X8H-wX0cF8A |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 1559-4076 |
| ExternalDocumentID | 23304369 |
| Genre | Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NLM NIH HHS grantid: R21 LM010826 – fundername: NLM NIH HHS grantid: R21LM010826-01 – fundername: NCRR NIH HHS grantid: UL1 RR024148 |
| GroupedDBID | 2WC 53G ADBBV ALMA_UNASSIGNED_HOLDINGS BAWUL CGR CUY CVF DIK E3Z ECM EIF GX1 HYE NPM OK1 RPM WOQ 7X8 |
| ID | FETCH-LOGICAL-p1115-38f419bdc64a5da0eeacbd31c0d02f373b4250873fe7aa7632a7600d0a6e3cd12 |
| IEDL.DBID | 7X8 |
| ISSN | 1942-597X |
| IngestDate | Thu Jul 10 22:54:18 EDT 2025 Mon Jun 09 02:51:54 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-p1115-38f419bdc64a5da0eeacbd31c0d02f373b4250873fe7aa7632a7600d0a6e3cd12 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 23304369 |
| PQID | 1273396635 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1273396635 pubmed_primary_23304369 |
| PublicationCentury | 2000 |
| PublicationDate | 2012-00-00 |
| PublicationDateYYYYMMDD | 2012-01-01 |
| PublicationDate_xml | – year: 2012 text: 2012-00-00 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | AMIA ... Annual Symposium proceedings |
| PublicationTitleAlternate | AMIA Annu Symp Proc |
| PublicationYear | 2012 |
| SSID | ssj0047586 |
| Score | 1.8685821 |
| Snippet | The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 940 |
| SubjectTerms | Abstracting and Indexing - methods Medical Subject Headings MEDLINE Models, Theoretical Natural Language Processing PubMed |
| Title | Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/23304369 https://www.proquest.com/docview/1273396635 |
| Volume | 2012 |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDI6AIcSF92O8FCSuFW3dV04IsU0gbdUOgMqpyqsSl3bQbb8fJ83ghITEpbdakRN__mwnNiE3wLhEIp9hWIKxagQSbS6MtYfkxDShlCy2L7xfx2meZ0XBpi7h1rprlStMtECtGmly5LcB-llgxj_ezT48MzXKVFfdCI110gOUbE51WnxXESLkwvZ1EYsw4GJp8TuLtN5ktPvfdeyRHccj6X238ftkTdcHZGviKuWH5G3g7rnYRsxU2Ge3dGlz9C1Fpkq1bR6BPofyxbxB4qoVtb0T0ZnRpqKT4WD8lA9vpwuBYikXJiki5-0ReRkNnx8ePTdGwZshkMUeZFUUMKFkEvFYcV8j1goFgfSVH1aQgkC79bMUKp1yjngTclOtUz5PNEgVhMdko25qfUqoSDEc5BAzJSGCpOIsRJG-YBJREmK_T65XSivxmJraA691s2jLH7X1yUmn-XLW9dMoQ5NTgYSd_eHvc7KNlCXskiAXpFehkepLsimX8_f288ruP37z6eQLRRW8-Q |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deterministic+binary+vectors+for+efficient+automated+indexing+of+MEDLINE%2FPubMed+abstracts&rft.jtitle=AMIA+...+Annual+Symposium+proceedings&rft.au=Wahle%2C+Manuel&rft.au=Widdows%2C+Dominic&rft.au=Herskovic%2C+Jorge+R&rft.au=Bernstam%2C+Elmer+V&rft.date=2012-01-01&rft.eissn=1559-4076&rft.volume=2012&rft.spage=940&rft_id=info%3Apmid%2F23304369&rft_id=info%3Apmid%2F23304369&rft.externalDocID=23304369 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1942-597X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1942-597X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1942-597X&client=summon |