Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization - A Case Study with Medicinal Chemistry Datasets

Graph propositionalization methods can be used to transform structured and relational data into fixed-length feature vectors, enabling standard machine learning algorithms to be used for generating predictive models. It is however not clear how well different propositionalization methods work in con...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2010 International Conference on Machine Learning and Applications s. 828 - 833
Hlavní autoři:	Karunaratne, Thashmee, Bostrom, Henrik, Norinder, U
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.12.2010
Témata:	Chemistry Classification algorithms Computer and systems science Computer and Systems Sciences Data mining Data- och systemvetenskap graph propositionalization Informatics, computer and systems science Informatik, data- och systemvetenskap Itemsets k-nearest neighbor Kernel Machine learning Machine learning algorithms medicinal chemistry random forests SAMHÄLLSVETENSKAP SOCIAL SCIENCES Statistics, computer and systems science Statistik, data- och systemvetenskap structured data support-vector machines
ISBN:	1424492114, 9781424492114
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Graph propositionalization methods can be used to transform structured and relational data into fixed-length feature vectors, enabling standard machine learning algorithms to be used for generating predictive models. It is however not clear how well different propositionalization methods work in conjunction with different standard machine learning algorithms. Three different graph propositionalization methods are investigated in conjunction with three standard learning algorithms: random forests, support vector machines and nearest neighbor classifiers. An experiment on 21 datasets from the domain of medicinal chemistry shows that the choice of propositionalization method may have a significant impact on the resulting accuracy. The empirical investigation further shows that for datasets from this domain, the use of the maximal frequent item set approach for propositionalization results in the most accurate classifiers, significantly outperforming the two other graph propositionalization methods considered in this study, SUBDUE and MOSS, for all three learning methods.
ISBN:	1424492114 9781424492114
DOI:	10.1109/ICMLA.2010.128