Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts

The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector represe...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:AMIA ... Annual Symposium proceedings Ročník 2012; s. 940
Hlavní autoři: Wahle, Manuel, Widdows, Dominic, Herskovic, Jorge R, Bernstam, Elmer V, Cohen, Trevor
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 2012
Témata:
ISSN:1942-597X, 1559-4076
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1942-597X
1559-4076