Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts

The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector represe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:AMIA ... Annual Symposium proceedings Jg. 2012; S. 940
Hauptverfasser: Wahle, Manuel, Widdows, Dominic, Herskovic, Jorge R, Bernstam, Elmer V, Cohen, Trevor
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States 2012
Schlagworte:
ISSN:1942-597X, 1559-4076
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.
AbstractList The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.
The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.
Author Wahle, Manuel
Cohen, Trevor
Herskovic, Jorge R
Bernstam, Elmer V
Widdows, Dominic
Author_xml – sequence: 1
  givenname: Manuel
  surname: Wahle
  fullname: Wahle, Manuel
  organization: The University of Texas Health Science Center at Houston, School of Biomedical Informatics, USA
– sequence: 2
  givenname: Dominic
  surname: Widdows
  fullname: Widdows, Dominic
– sequence: 3
  givenname: Jorge R
  surname: Herskovic
  fullname: Herskovic, Jorge R
– sequence: 4
  givenname: Elmer V
  surname: Bernstam
  fullname: Bernstam, Elmer V
– sequence: 5
  givenname: Trevor
  surname: Cohen
  fullname: Cohen, Trevor
BackLink https://www.ncbi.nlm.nih.gov/pubmed/23304369$$D View this record in MEDLINE/PubMed
BookMark eNo1UEtLAzEYDFKxD_0LkqOXxTw2SfcobdVCqx4U9LRkky8S6WbXJCv6712wXmYGZhiYmaNJ6AKcoBkVoipKouRk1FXJClGp1ymap_RBSKnEUp6hKeOclFxWM_S2hgyx9cGn7A1ufNDxB3-ByV1M2HURg3PeeAgZ6yF3rc5gsQ8Wvn14x53D-816t33YXD8NzX60dJNy1Canc3Tq9CHBxZEX6OV287y6L3aPd9vVza7oKaWi4EtX0qqxRpZaWE0AtGksp4ZYwhxXvCmZIEvFHSitleRsBDKaWgI3lrIFuvrr7WP3OUDKdeuTgcNBB-iGVFOmOK-k5GKMXh6jQ9OCrfvo23Fu_X8H-wX0cF8A
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 1559-4076
ExternalDocumentID 23304369
Genre Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NLM NIH HHS
  grantid: R21 LM010826
– fundername: NLM NIH HHS
  grantid: R21LM010826-01
– fundername: NCRR NIH HHS
  grantid: UL1 RR024148
GroupedDBID 2WC
53G
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BAWUL
CGR
CUY
CVF
DIK
E3Z
ECM
EIF
GX1
HYE
NPM
OK1
RPM
WOQ
7X8
ID FETCH-LOGICAL-p1115-38f419bdc64a5da0eeacbd31c0d02f373b4250873fe7aa7632a7600d0a6e3cd12
IEDL.DBID 7X8
ISSN 1942-597X
IngestDate Thu Jul 10 22:54:18 EDT 2025
Mon Jun 09 02:51:54 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-p1115-38f419bdc64a5da0eeacbd31c0d02f373b4250873fe7aa7632a7600d0a6e3cd12
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 23304369
PQID 1273396635
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1273396635
pubmed_primary_23304369
PublicationCentury 2000
PublicationDate 2012-00-00
PublicationDateYYYYMMDD 2012-01-01
PublicationDate_xml – year: 2012
  text: 2012-00-00
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle AMIA ... Annual Symposium proceedings
PublicationTitleAlternate AMIA Annu Symp Proc
PublicationYear 2012
SSID ssj0047586
Score 1.8685821
Snippet The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 940
SubjectTerms Abstracting and Indexing - methods
Medical Subject Headings
MEDLINE
Models, Theoretical
Natural Language Processing
PubMed
Title Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts
URI https://www.ncbi.nlm.nih.gov/pubmed/23304369
https://www.proquest.com/docview/1273396635
Volume 2012
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB7Uinjx_agvVvAamuxukuYkYlsU2tCDQjyVfYKXpJq2v9_ZTaonQfCSW0LY2f2-b7_ZnQG4UxFXkeAyiGKdBNw4HGQpC5Dskhj1SMysL-I6TvO8XxTZtDXc6vZY5RoTPVDrSjmPvBchz7LM8eP9_CNwXaNcdrVtobEJHYZSxs3qtPjOInDUwv52UcZxw5Wlxe8q0rPJaP-__3EAe62OJA9N4A9hw5RHsDNpM-XH8DZoz7n4QsxE-mu3ZOU9-pqgUiXGF49AziFiuahQuBpNfO1EJDNSWTIZDsbP-bA3XUr8LBHSmSJqUZ_A62j48vgUtG0UgjkCWRywvuVRJrVKuIi1CA1irdQsUqEOqcXASFy3YT9l1qRCIN5Q4bJ1OhSJYUpH9BS2yqo050CQ5jQNjaAS95XUJCKVlmNAQ-Gqulvdhdv1oM1wmrrcgyhNtaxnP8PWhbNm5Gfzpp7GjDpPhSXZxR_evoRdlCy0MUGuoGNxkZpr2FarxXv9eePjj898OvkC8D685w
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deterministic+binary+vectors+for+efficient+automated+indexing+of+MEDLINE%2FPubMed+abstracts&rft.jtitle=AMIA+...+Annual+Symposium+proceedings&rft.au=Wahle%2C+Manuel&rft.au=Widdows%2C+Dominic&rft.au=Herskovic%2C+Jorge+R&rft.au=Bernstam%2C+Elmer+V&rft.date=2012-01-01&rft.eissn=1559-4076&rft.volume=2012&rft.spage=940&rft_id=info%3Apmid%2F23304369&rft_id=info%3Apmid%2F23304369&rft.externalDocID=23304369
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1942-597X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1942-597X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1942-597X&client=summon