Enhancing neural combinatorial optimization by progressive training paradigm

Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforceme...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Neurocomputing (Amsterdam) Ročník 659; s. 131707
Hlavní autoři:	Cao, Zhi, Wu, Yaoxin, Hou, Yaqing, Ge, Hongwei
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.01.2026
Témata:	Attention model Neural combinatorial optimization Reinforcement learning Vehicle routing problem Attention model Neural combinatorial optimization Vehicle routing problem Reinforcement learning
ISSN:	0925-2312
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforcement learning (RL) or supervised learning (SL). The one-stage training inevitably entails the computation-intensive labeling (i.e., solving optimal solutions) in SL or less-informative sparse rewards in RL. In this work, we propose a progressive training paradigm that pre-trains a neural network on small-scale instances using SL and then fine-tunes it using RL. In the former stage, the optimal solutions as labels effectively guide the neural network training, thereby bypassing the sparse reward issue. In the latter, the neural network is trained using RL to solve large-scale problems, avoiding the labels of optimal solutions that are hard to obtain. Moreover, we propose a decomposition-based approach that enables RL training with larger problem scales, alleviating the issue of insufficient memory induced by the heavy neural network. The proposed paradigm advances existing NCO models to obtain near-optimal solutions for the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) with up to 10,000 nodes. Additionally, it enhances the generalization performance across instances of different sizes and distributions, as well as real-world TSPLib and CVRPLib instances.
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2025.131707