Smart Operators for Inducing Colorectal Cancer Classification Trees with PonyGE2 Grammatical Evolution Python Package

Colorectal cancer is a disease that affects many people and requires a multidisciplinary approach, involving significant human and economic resources. We have been provided with a tabular dataset with 1.5 thousand cases of this disease. We are interested in producing interpretable classifiers for pr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 IEEE Congress on Evolutionary Computation (CEC) s. 1 - 9
Hlavní autoři: Delgado-Osuna, Jose A., Garcia-Martinez, Carlos, Ventura, Sebastian
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 18.07.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Colorectal cancer is a disease that affects many people and requires a multidisciplinary approach, involving significant human and economic resources. We have been provided with a tabular dataset with 1.5 thousand cases of this disease. We are interested in producing interpretable classifiers for predicting the occurrence of complications. Grammatical Evolution has extensively been used for machine learning problems. In particular, it can be used to induce interpretable decision trees, with the advantage of allowing the practitioner to easily control the language by means of the grammar. PonyGE2 [1], [2] is a Python package that provides data scientists with Grammatical Evolution algorithms, which can be configured to their needs quite easily. In addition, and thanks to the benefits of the Python programming language, PonyGE2 is currently becoming more and more popular. However, the capabilities of PonyGE2 for inducing classification trees are still subject of improvement. In particular, it only uses simple equality conditions and requires to encode feature names and values with numbers. We have developed some smart operators for PonyGE2, which, not only enhance the framework in interpretability and performance when dealing with our colorectal cancer dataset, but also allows to produce results comparable to those of the widely known heuristic methods C4.5 and CART. We show how they could be applied to other datasets, and how they affect performance in our case.
DOI:10.1109/CEC55065.2022.9870361