Enhancing neural combinatorial optimization by progressive training paradigm

Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforceme...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Neurocomputing (Amsterdam) Ročník 659; s. 131707
Hlavní autori:	Cao, Zhi, Wu, Yaoxin, Hou, Yaqing, Ge, Hongwei
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier B.V 01.01.2026
Predmet:	Attention model Neural combinatorial optimization Reinforcement learning Vehicle routing problem Attention model Neural combinatorial optimization Vehicle routing problem Reinforcement learning
ISSN:	0925-2312
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Neural Combinatorial Optimization (NCO) methods have garnered considerable attention, due to their effectiveness in automatic algorithm design for solving combinatorial optimization problems. Current constructive NCO methods predominantly employ a one-stage training paradigm using either reinforcement learning (RL) or supervised learning (SL). The one-stage training inevitably entails the computation-intensive labeling (i.e., solving optimal solutions) in SL or less-informative sparse rewards in RL. In this work, we propose a progressive training paradigm that pre-trains a neural network on small-scale instances using SL and then fine-tunes it using RL. In the former stage, the optimal solutions as labels effectively guide the neural network training, thereby bypassing the sparse reward issue. In the latter, the neural network is trained using RL to solve large-scale problems, avoiding the labels of optimal solutions that are hard to obtain. Moreover, we propose a decomposition-based approach that enables RL training with larger problem scales, alleviating the issue of insufficient memory induced by the heavy neural network. The proposed paradigm advances existing NCO models to obtain near-optimal solutions for the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) with up to 10,000 nodes. Additionally, it enhances the generalization performance across instances of different sizes and distributions, as well as real-world TSPLib and CVRPLib instances.
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2025.131707