Reinforcement learning for search tree size minimization in Constraint Programming: New results on scheduling benchmarks

Failure-Directed Search (FDS) is a significant complete generic search algorithm used in Constraint Programming (CP) to efficiently explore the search space, proven particularly effective on scheduling problems. This paper analyzes FDS’s properties, showing that minimizing the size of its search tre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & industrial engineering Jg. 209; S. 111413
Hauptverfasser:	Heinz, Vilém, Vilím, Petr, Hanzálek, Zdeněk
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Ltd 01.11.2025
Schlagworte:	Constraint Programming Discrete optimization Heuristics Reinforcement learning Scheduling Tree search 68T20 90C99 Heuristics 90-08 90B35 Reinforcement learning Discrete optimization 90C59 Scheduling 90C27 Constraint Programming Tree search
ISSN:	0360-8352
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Failure-Directed Search (FDS) is a significant complete generic search algorithm used in Constraint Programming (CP) to efficiently explore the search space, proven particularly effective on scheduling problems. This paper analyzes FDS’s properties, showing that minimizing the size of its search tree guided by ranked branching decisions is closely related to the Multi-armed bandit (MAB) problem. Building on this insight, MAB reinforcement learning algorithms are applied to FDS, extended with problem-specific refinements and parameter tuning, and evaluated on the two most fundamental scheduling problems, the Job Shop Scheduling Problem (JSSP) and Resource-Constrained Project Scheduling Problem (RCPSP). The resulting enhanced FDS, using the best extended MAB algorithm and configuration, performs 1.7 times faster on the JSSP and 2.5 times faster on the RCPSP benchmarks compared to the original implementation in a new solver called OptalCP, while also being 3.5 times faster on the JSSP and 2.1 times faster on the RCPSP benchmarks than the current state-of-the-art FDS algorithm in IBM CP Optimizer 22.1. Furthermore, using only a 900s time limit per instance, the enhanced FDS improved the existing state-of-the-art lower bounds of 78 of 84 JSSP and 226 of 393 RCPSP standard open benchmark instances while also completely closing a few of them. •Reinforcement learning strongly improves Failure-Directed Search (FDS) efficiency.•FDS parameter tuning yields noticeable improvement and insight into their importance.•Two-fold improvement over baseline FDS achieved on fundamental scheduling problems.•Even larger improvement is achieved over state-of-the-art CP Optimizer’s FDS.•Hundreds of improved lower bounds for famous JSSP and RCPSP instances were obtained.
ISSN:	0360-8352
DOI:	10.1016/j.cie.2025.111413