Bag of textual graphs (BoTG): A general graph‐based text representation model

Text representation models are the fundamental basis for information retrieval and text mining tasks. Although different text models have been proposed, they typically target specific task aspects in isolation, such as time efficiency, accuracy, or applicability for different scenarios. Here we pres...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of the Association for Information Science and Technology Ročník 70; číslo 8; s. 817 - 829
Hlavní autoři: Dourado, Ícaro Cavalcante, Galante, Renata, Gonçalves, Marcos André, Silva Torres, Ricardo
Médium: Journal Article
Jazyk:angličtina
Vydáno: Hoboken, USA John Wiley & Sons, Inc 01.08.2019
ISSN:2330-1635, 2330-1643
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Text representation models are the fundamental basis for information retrieval and text mining tasks. Although different text models have been proposed, they typically target specific task aspects in isolation, such as time efficiency, accuracy, or applicability for different scenarios. Here we present Bag of Textual Graphs (BoTG), a general text representation model that addresses these three requirements at the same time. The proposed textual representation is based on a graph‐based scheme that encodes term proximity and term ordering, and represents text documents into an efficient vector space that addresses all these aspects as well as provides discriminative textual patterns. Extensive experiments are conducted in two experimental scenarios—classification and retrieval—considering multiple well‐known text collections. We also compare our model against several methods from the literature. Experimental results demonstrate that our model is generic enough to handle different tasks and collections. It is also more efficient than the widely used state‐of‐the‐art methods in textual classification and retrieval tasks, with a competitive effectiveness, sometimes with gains by large margins.
ISSN:2330-1635
2330-1643
DOI:10.1002/asi.24167