Enhancing neural combinatorial optimization by progressive training paradigm
Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforceme...
Uloženo v:
| Vydáno v: | Neurocomputing (Amsterdam) Ročník 659; s. 131707 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.01.2026
|
| Témata: | |
| ISSN: | 0925-2312 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforcement learning (RL) or supervised learning (SL). The one-stage training inevitably entails the computation-intensive labeling (i.e., solving optimal solutions) in SL or less-informative sparse rewards in RL. In this work, we propose a progressive training paradigm that pre-trains a neural network on small-scale instances using SL and then fine-tunes it using RL. In the former stage, the optimal solutions as labels effectively guide the neural network training, thereby bypassing the sparse reward issue. In the latter, the neural network is trained using RL to solve large-scale problems, avoiding the labels of optimal solutions that are hard to obtain. Moreover, we propose a decomposition-based approach that enables RL training with larger problem scales, alleviating the issue of insufficient memory induced by the heavy neural network. The proposed paradigm advances existing NCO models to obtain near-optimal solutions for the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) with up to 10,000 nodes. Additionally, it enhances the generalization performance across instances of different sizes and distributions, as well as real-world TSPLib and CVRPLib instances. |
|---|---|
| ISSN: | 0925-2312 |
| DOI: | 10.1016/j.neucom.2025.131707 |