Active Learning in Genetic Programming: Guiding Efficient Data Collection for Symbolic Regression
This article examines various methods of computing uncertainty and diversity for active learning in genetic programming. We found that the model population in genetic programming can be exploited to select informative training data points by using a model ensemble combined with an uncertainty metric...
Saved in:
| Published in: | IEEE transactions on evolutionary computation Vol. 29; no. 4; pp. 1100 - 1111 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
IEEE
01.08.2025
|
| Subjects: | |
| ISSN: | 1089-778X, 1941-0026 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This article examines various methods of computing uncertainty and diversity for active learning in genetic programming. We found that the model population in genetic programming can be exploited to select informative training data points by using a model ensemble combined with an uncertainty metric. We explored several uncertainty metrics and found that differential entropy performed the best. We also compared two data diversity metrics and found that correlation as a diversity metric performs better than minimum Euclidean distance, although there are some drawbacks that prevent correlation from being used on all problems. Finally, we combined uncertainty and diversity using a Pareto optimization approach to allow both to be considered in a balanced way to guide the selection of informative and unique data points for training. |
|---|---|
| ISSN: | 1089-778X 1941-0026 |
| DOI: | 10.1109/TEVC.2024.3471341 |