Active Learning in Genetic Programming: Guiding Efficient Data Collection for Symbolic Regression

Uložené v:
Podrobná bibliografia
Názov: Active Learning in Genetic Programming: Guiding Efficient Data Collection for Symbolic Regression
Autori: Nathan Haut, Wolfgang Banzhaf, Bill Punch
Zdroj: IEEE Transactions on Evolutionary Computation. 29:1100-1111
Publication Status: Preprint
Informácie o vydavateľovi: Institute of Electrical and Electronics Engineers (IEEE), 2025.
Rok vydania: 2025
Predmety: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Popis: This paper examines various methods of computing uncertainty and diversity for active learning in genetic programming. We found that the model population in genetic programming can be exploited to select informative training data points by using a model ensemble combined with an uncertainty metric. We explored several uncertainty metrics and found that differential entropy performed the best. We also compared two data diversity metrics and found that correlation as a diversity metric performs better than minimum Euclidean distance, although there are some drawbacks that prevent correlation from being used on all problems. Finally, we combined uncertainty and diversity using a Pareto optimization approach to allow both to be considered in a balanced way to guide the selection of informative and unique data points for training.
Druh dokumentu: Article
ISSN: 1941-0026
1089-778X
DOI: 10.1109/tevc.2024.3471341
DOI: 10.48550/arxiv.2308.00672
Prístupová URL adresa: http://arxiv.org/abs/2308.00672
Rights: CC BY
Prístupové číslo: edsair.doi.dedup.....99e84dae06c82d0d9b16e23ff76e01a3
Databáza: OpenAIRE
Popis
Abstrakt:This paper examines various methods of computing uncertainty and diversity for active learning in genetic programming. We found that the model population in genetic programming can be exploited to select informative training data points by using a model ensemble combined with an uncertainty metric. We explored several uncertainty metrics and found that differential entropy performed the best. We also compared two data diversity metrics and found that correlation as a diversity metric performs better than minimum Euclidean distance, although there are some drawbacks that prevent correlation from being used on all problems. Finally, we combined uncertainty and diversity using a Pareto optimization approach to allow both to be considered in a balanced way to guide the selection of informative and unique data points for training.
ISSN:19410026
1089778X
DOI:10.1109/tevc.2024.3471341