Automatic Taxonomy Extraction Using Google and Term Dependency

An automatic taxonomy extraction algorithm is proposed. Given a set of terms or terminology related to a subject domain, the proposed approach uses Google page count to estimate the dependency links between the terms. A taxonomic link is an asymmetric relation between two concepts. In order to extra...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence s. 321 - 325
Hlavní autoři:	Makrehchi, Masoud, Kamel, Mohamed S.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	Washington, DC, USA IEEE Computer Society 02.11.2007
Edice:	ACM Conferences
Témata:	Information systems > Information retrieval Information systems > Information retrieval > Document representation Information systems > Information retrieval > Evaluation of retrieval results Information systems > Information retrieval > Search engine architectures and scalability > Search engine indexing Information systems > Information systems applications > Data mining Mathematics of computing > Information theory
ISBN:	0769530265, 9780769530260
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	An automatic taxonomy extraction algorithm is proposed. Given a set of terms or terminology related to a subject domain, the proposed approach uses Google page count to estimate the dependency links between the terms. A taxonomic link is an asymmetric relation between two concepts. In order to extract these directed links, neither mutual information nor normalized Google distance can be employed. Using the new measure of information theoretic inclusion index, term dependency matrix, which represents the pair-wise dependencies, is obtained. Next, using a proposed algorithm, the dependency matrix is converted into an adjacency matrix, representing the taxonomy tree. In order to evaluate the performance of the proposed approach, it is applied to several domains for taxonomy extraction.
ISBN:	0769530265 9780769530260
DOI:	10.1109/WI.2007.26