Multi-parameter control for the (1+(λ,λ))-GA on OneMax via deep reinforcement learning

Gespeichert in:
Bibliographische Detailangaben
Titel: Multi-parameter control for the (1+(λ,λ))-GA on OneMax via deep reinforcement learning
Autoren: Nguyen, Tai, Le, Phong, Doerr, Carola, Dang, Nguyen
Quelle: Nguyen , T , Le , P , Doerr , C & Dang , N 2025 , Multi-parameter control for the (1+(λ,λ))-GA on OneMax via deep reinforcement learning . in Proceedings of the 18th ACM/SIGEVO Conference on Foundations of Genetic Algorithms . ACM , Conference on Foundations of Genetic Algorithms , Leiden , Netherlands , 27/08/25 . https://doi.org/10.1145/3729878.3746703
Verlagsinformationen: ACM
Publikationsjahr: 2025
Schlagwörter: Deep reinforcement learning, Dynamic algorithm configuration, Optimisation
Beschreibung: It is well known that evolutionary algorithms can benefit from dynamic choices of the key parameters that control their behavior, to adjust their search strategy to the different stages of the optimization process. A prominent example where dynamic parameter choices have shown a provable super-constant speed-up is the (1+(λ,λ)) Genetic Algorithm optimizing the OneMax function. While optimal parameter control policies result in linear expected running times, this is not possible with static parameter choices. This result has spurred a lot of interest in parameter control policies. However, many works, in particular theoretical running time analyses, focus on controlling one single parameter. Deriving policies for controlling multiple parameters remains very challenging. In this work we reconsider the problem of the (1+(λ,λ)) Genetic Algorithm optimizing OneMax. We decouple its four main parameters and investigate how well state-of-the-art deep reinforcement learning techniques can approximate good control policies. We show that although making deep reinforcement learning learn effectively is a challenging task, once it works, it is very powerful and is able to find policies that outperform all previously known control policies on the same benchmark. Based on the results found through reinforcement learning, we derive a simple control policy that consistently outperforms the default theory-recommended setting by 27% and the irace-tuned policy, the strongest existing control policy on this benchmark, by 13%, for all tested problem sizes up to 40,000.
Publikationsart: article in journal/newspaper
Sprache: English
DOI: 10.1145/3729878.3746703
Verfügbarkeit: https://research-portal.st-andrews.ac.uk/en/publications/f6c918e8-0d5a-45b5-ab32-78022b6d065a
https://doi.org/10.1145/3729878.3746703
https://dl.acm.org/conference/foga
https://arxiv.org/abs/2505.12982
Rights: info:eu-repo/semantics/openAccess
Dokumentencode: edsbas.DB721B5E
Datenbank: BASE
Beschreibung
Abstract:It is well known that evolutionary algorithms can benefit from dynamic choices of the key parameters that control their behavior, to adjust their search strategy to the different stages of the optimization process. A prominent example where dynamic parameter choices have shown a provable super-constant speed-up is the (1+(λ,λ)) Genetic Algorithm optimizing the OneMax function. While optimal parameter control policies result in linear expected running times, this is not possible with static parameter choices. This result has spurred a lot of interest in parameter control policies. However, many works, in particular theoretical running time analyses, focus on controlling one single parameter. Deriving policies for controlling multiple parameters remains very challenging. In this work we reconsider the problem of the (1+(λ,λ)) Genetic Algorithm optimizing OneMax. We decouple its four main parameters and investigate how well state-of-the-art deep reinforcement learning techniques can approximate good control policies. We show that although making deep reinforcement learning learn effectively is a challenging task, once it works, it is very powerful and is able to find policies that outperform all previously known control policies on the same benchmark. Based on the results found through reinforcement learning, we derive a simple control policy that consistently outperforms the default theory-recommended setting by 27% and the irace-tuned policy, the strongest existing control policy on this benchmark, by 13%, for all tested problem sizes up to 40,000.
DOI:10.1145/3729878.3746703